2,729 Matching Annotations
  1. Feb 2024
    1. Author Response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      […] While this does not rule out criticality in the brain, it decidedly weakens the evidence for it, which was based on the following logic: critical systems give rise to power law behavior; power law behavior is observed in cortical networks; therefore, cortical networks operate near a critical point. Given, as shown in this paper, that power laws can arise from noncritical processes, the logic breaks. Moreover, the authors show that criticality does not imply optimal information transmission (one of its proposed functions). This highlights the necessity for more rigorous analyses to affirm criticality in the brain. In particular, it suggests that attention should be focused on the question "does the brain implement a dynamical latent variable model?".

      These authors are not the first to show that slowly varying firing rates can give rise to power law behavior (see, for example, Touboul and Destexhe, 2017; Priesemann and Shriki, 2018). However, to our knowledge they are the first to show crackling, and to compute information transmission in the critical state.

      We thank the reviewers for their thoughtful assessment of our paper.

      We would push back on the assessment that our model ‘has nothing to do with criticality,’ and that we observed ‘signatures of criticality [that] emerge through fundamentally non-critical mechanisms.’ This assessment partially stems from the definition of criticality provided in the Public Comment, that ‘criticality is a very specific set of phenomena in physics in which fundamentally local interactions produce unexpected long-range behavior.’

      Our disagreement is largely focused on this definition, which we do not think is a standard definition. Taking the favorite textbook example, the Ising model, criticality is characterized by a set of power-law divergences in thermodynamic quantities (e.g., susceptibility, specific heat, magnetization) at the critical temperature, with exponents of these power laws governed by scaling laws. It is not defined by local interactions. All-to-all Ising model is generally viewed as showing a critical behavior at a certain temperature, even though interactions there are manifestly non-local. It is possible that, by “local” in the definition, the Public Comment meant that interactions are “collective” and among microscopic degrees of freedom. However, that same all-to-all Ising model is mathematically equivalent to the mean-field model, where criticality is achieved through large fluctuations of the mean field, but not through microscopic interactions.

      More commonly, criticality is defined by power laws and scaling relationships that emerge at a critical value of a parameter(s) of the system. That is, criticality is defined by its signatures. What is crucial in all such definitions is that this atypical, critical state requires fine tuning. For example, in the textbook example of the Ising model, a parameter (the temperature) must be tuned to a critical value for critical behavior to appear. In the branching process model that generates avalanche criticality, criticality requires tuning m=1. The key result of our paper is that all signatures expected for avalanche criticality (power laws, crackling, and, as shown below, estimates of the branching rate m), and hence the criticality itself, appear without fine-tuning.

      As we discussed in our introduction, there are a few other instances of signatures of criticality (and hence of criticality itself) emerging without fine-tuning. The first we are aware of was the demonstration of Zipf’s Law (by Schwab, et al. 2014, and Aitchison et al. 2016), a power-law relationship between rank and frequency of states, which was shown to emerge generically in systems driven by a broadly distributed latent variable. A second example, arising from applications of coarse-graining analysis to neural data (cf., Meshulam et al. 2019; also, Morales et al., 2023), was demonstrated in our earlier paper (Morrell et al. 2021). Thus, here we have a third example: the model in this paper generates signatures of criticality in the statistics of avalanches of activity, and it does so without fine-tuning (cf., Fig. 2-3).

      The rate at which these ‘criticality without fine-tuning' examples are piling up may inspire revisiting the requirement of fine-tuning in the definition of criticality, and our ongoing work (Ngampruetikorn et al. 2023) suggests that criticality may be more accurately defined through large fluctuations (variance > 1/N) rather than through fine-tuning or scaling relations.

      References:

      • Schwab DJ, Nemenman I, Mehta P. “Zipf’s Law and Criticality in Multivariate Data without FineTuning.” Phys Rev Lett. 2014 Aug; doi::101103/PhysRevLett.113.068102,

      • Aitchison L, Corradi N, Latham PE. “Zipf’s Law Arising Naturally When There Are Underlying, Unobserved Variables.” PLOS Computational biology. 2016 12; 12(12):1-32. doi:10.1371/journal.pcbi.1005110

      • Meshulam L, Gauthier JL, Brody CD, Tank DW, Bialek W. “Coarse Graining, Fixed Points, and Scaling in a Large Population of Neurons.” Phys Rev Lett. 2019 Oct; doi: 10.1103/PhysRevLett.123.178103.

      • Morales GB, di Santo S, Muñoz MA. “Quasiuniversal scaling in mouse-brain neuronal activity stems from edge-of-instability critical dynamics.” Proceedings of the National Academy of Sciences. 2023; 120(9):e2208998120.

      • Morrell MC, Sederberg AJ, Nemenman I. “Latent Dynamical Variables Produce Signatures of Spatiotemporal Criticality in Large Biological Systems.” Phys Rev Lett. 2021 Mar; doi: 10.1103/PhysRevLett.126.118302.

      • Ngampruetikorn, V., Nemenman, I., Schwab, D., “Extrinsic vs Intrinsic Criticality in Systems with Many Components.” arXiv: arXiv:2309.13898 [physics.bio-ph]

      Major comments:

      1) For many readers, the essential messages of the paper may not be immediately clear. For example, is the paper criticizing the criticality hypothesis of cortical networks, or does the criticism extend deeper, to the theoretical predictions of "crackling" relationships in physical systems as they can emerge without criticality? Statements like "We show that a system coupled to one or many dynamical latent variables can generate avalanche criticality ..." could be misinterpreted as affirming criticality. A more accurate language is needed; for instance, the paper could state that the model generates relationships observed in critical systems. The paper should provide a clearer conclusion and interpretation of the findings in the context of the criticality hypothesis of cortical dynamics.

      Please see the response to the Public Review, above. To clarify the essential message that the dynamical latent variable model produces avalanche criticality without fine-tuning, we have made revisions to the abstract and introduction. This point was already made in the discussion (first sentence).

      Key sentences changed in the abstract:

      "… We find that populations coupled to multiple latent variables produce critical behavior across a broader parameter range than those coupled to a single, quasi-static latent variable, but in both cases, avalanche criticality is observed without fine-tuning of model parameters. … Our results suggest that avalanche criticality arises in neural systems in which activity is effectively modeled as a population driven by a few dynamical variables and these variables can be inferred from the population activity."

      In the introduction, we changed the final sentence to read:

      "These results demonstrate how criticality in neural recordings can arise from latent dynamics in neural activity, without need for fine-tuning of network parameters."

      2) On lines 97-99, the authors state that "We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from recurrent dynamics locally". This idea is also repeated at the beginning of the Summary section. Perhaps being agnostic isn't such a good idea: it's possible that the recurrent dynamics is in a critical regime, which would just push the problem upstream. Presumably you're thinking of recurrent dynamics with slow timescales that's not critical? Or are you happy if it's in the critical regime? This should be clarified.

      We have amended this sentence to clarify that any latent dynamics with large fluctuations would suffice:

      ”We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from large fluctuations in local recurrent dynamics.”

      3) Even though the model in Equation 2 has been described in a previous publication and the Methods section, more details regarding the origin and justification of this model in the context of cortical networks would be helpful in the Results section. Was it chosen just for simplicity, or was there a deeper reason?

      This model was chosen for its simplicity: there are no direct interactions between neurons, coupling between neurons and latent variables is random, and simulation is straightforward. More complex latent dynamics or non-random structure in the coupling matrices could have been used, but our aim was to explore this model in the simplest setting possible.

      We have revised the Results (“Avalanche scaling in a dynamical latent variable model,” first paragraph) to justify the choice of the model:

      "We study a model of a population of neurons that are not coupled to each other directly but are driven by a small number of dynamical latent variables -- that is, slowly changing inputs that are not themselves measured (Fig.~\ref{fig:fig1}A). We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from large fluctuations in local recurrent dynamics. The model was chosen for its simplicity, and because we have previously shown that this model with at least about five latent variables can produce power laws under the coarse-graining analysis \citep{Morrell2021}."

      We have added the following to the beginning of the Methods section expanding on the reasons for this choice:

      "We study a model from Morrell 2021, originally constructed as a model of large populations of neurons in mouse hippocampus. Neurons are non-interacting, receiving inputs reflective of place-field selectivity as well as input current arising from a random projection from a small number of dynamical latent variables, representing inputs shared across the population of neurons that are not directly measured or controlled. In the current paper, we incorporate only the latent variables (no place variables), and we assume that every cell is coupled to every latent variable with some randomly drawn coupling strength."

      4) The Methods section (paragraph starting on line 340) connects the time scale to actual time scales in neuronal systems, stating that "The timescales of latent variables examined range from about 3 seconds to 3000 seconds, assuming 3-ms bins". While bins of 3 ms are relevant for electrophysiological data from LFPs or high-density EEG/MEG, time scales above 10 seconds are difficult to generate through biophysically clear processes like ionic channels and synaptic transmission. The paper suggests that slow time scales of the latent variables are crucial for obtaining power law behavior resembling criticality. Yet, one way to generate such slow time scales is via critical slowing down, implying that some brain areas providing input to the network under study may operate near criticality. This pushes the problem toward explaining the criticality of those external networks. Hence, discussing potential sources for slow time scales in latent variables is crucial. One possibility you might want to consider is sources external to the organism, which could easily have time scales in the 1-24 hour range.

      As the reviewers note, it is a possibility that slow timescales arise from some other brain area in which dynamics are slow due to critical dynamics, but many other plausible sources exist. These include slowly varying sensory stimuli or external sources, as suggested by the reviewers. It is also possible to generate “effective” slow dynamics from non-critical internal sources. One example, from recordings in awake mice, is the slow change in the level of arousal that occurs on the scale of many seconds to minutes. These changes arise from release of neuromodulators that have broad effects on neural populations and correlations in activity (for a focused review, see Poulet and Crochet, 2019).

      We have added the following sentence to the Methods section where timescales of latent variables was discussed:

      "The timescales of latent variables examined range from about $3$ seconds to $3000$ seconds, assuming $3$-ms bins. Inputs with such timescales may arise from external sources, such as sensory stimuli, or from internal sources, such as changes in physiological state."

      5) It is common in neuronal avalanche analysis to calculate the branching parameter using the ratio of events in consecutive bins. Near-critical systems should display values close to 1, especially in simulations without subsampling. Including the estimated values of the branching parameter for the different cases investigated in this study could provide more comprehensive data. While the paper acknowledges that the obtained exponents in the model differ from those in a critical branching process, it would still be beneficial to offer the branching parameter of the observed avalanches for comparison.

      The reviewers requested that the branching parameter be computed in our model. We point out that, for the quasi-stationary latent variables (as in Fig. 3), a branching parameter of 1 is expected because the summed activity at time t+k is, on average, equal to the summed activity at time t, regardless of k. Numerics are consistent with this expectation. Following the methodology for an unbiased estimate of the branching parameter from Wilting and Priesemann (2018), we checked an example set of parameters (epsilon = 8, eta = 3) for quasi-stationary latent fields. We found that the naïve (biased) estimate of the branching parameter was 0.94, and that the unbiased estimator was exp(−1.4⋅10−8) ≈ 0.999999986.

      For faster time scales, it is no longer true that summed activity is constant over time, as the temporal correlations in activity decay exponentially. Using the five-field simulation from Figure 2, we calculated the branching parameter for several values of tau. The biased estimates of m are 0.76 (𝜏=50), 0.79 (𝜏=500), and 0.79 (𝜏=5000). The corrected estimates are 0.98 (𝜏=50), 0.998 (𝜏=500), and 0.9998 (𝜏=5000).

      6) In the Discussion (l 269), the paper suggests potential differences between networks cultured in vitro and in vivo. While significant differences indeed exist, it's worth noting that exponents consistent with a critical branching process have also been observed in vivo (Petermann et al 2009; Hahn et al. 2010), as well as in large-scale human data.

      We thank the reviewers for pointing out these studies, and we have added the missing one (Hahn et al. 2010) to our reference list. The following was added to the discussion, in the section “Explaining Experimental Exponents:”

      "A subset of the in vivo recordings analyzed from anesthetized cat (Hahn et al. 2010) and macaque monkeys (Petermann et al. 2009) exhibited a size distribution exponent close to 1.5."

      Along these lines, we noted two additional studies of high relevance that have been published since our initial submission (Capek et al. 2023, Lombardi et al. 2023), and we have added these references to the discussion of experimental exponents.

      Minor comments:

      1) The term 'latent variable' should be rigorously explained, as it is likely to be unfamiliar to some readers.

      Sentences and clauses have been added to the Introduction, Results and the Methods to clarify the term:

      Intro: “Numerous studies have reported relatively low-dimensional structure in the activity of large populations of neurons [refs], which can be modeled by a population of neurons that are broadly and heterogeneously coupled to multiple dynamical latent (i.e., unobserved) variables.”

      Results: “We studied a population of neurons that are not coupled to each other directly but are driven by a small number of dynamical latent variables -- that is, slowly changing inputs that are not themselves measured.”

      Methods: “Neurons are non-interacting, receiving inputs reflective of place-field selectivity as well as input current reflecting a random projection from a small number of dynamical latent variables, representing inputs shared across the population of neurons that are not directly measured.”

      2) There's a relatively important typo in the equations: Eq. 2 and Eq. 6 differ by a minus sign in the exponent. Eqs. 3 and 4 use the plus sign, but epsilon_0 on line 198 uses the minus sign. All very confusing until we figured out what was going on. But easy to fix.

      Thank you for catching this. We have made the following corrections:

      1) Figures adopted the sign convention that epsilon > 0, with larger values of epsilon decreasing the activity level. Signs in Eqs. 3 and 4 have been corrected to match.

      2) Equation 5 was missing a minus sign in front of the Hamiltonian. Restoring this minus sign fixed the discrepancy between 2 and 6.

      3) In Eq. 7, the left hand side is zeta'/zeta', which is equal to 1. Maybe it should be zeta'/zeta? Fixed, thank you.

      Additional comments:

      The authors are free to ignore these; they are meant to improve the paper.

      We are extremely grateful for the close reading of our paper and note the actions taken below.

      1) We personally would not use the abbreviation DLV; we find abbreviations extremely hard to remember. And DLV is not used that often.

      Done, thank you for the suggestion.

      2) l 198: epsilon_0 = -log(2^{1/N}-1) was kind of hard to picture -- we had to do a little algebra to make sense of it. Why not write e^{-epsilon_0} = 2^{1/N}-1 \approx log(2)/N, which in turn implies that epsilon_0 ~ log(N)?

      Thank you, good point. We have added a sentence now to better explain:

      "...which is maximized at $\epsilon_0 = - \log (2^{1/N} - 1)$, independent of $J_i$ and $\eta$. After some algebra, we find that $\epsilon_0 \sim \log N$ for large $N$."

      3) Typo on l 202: "We plot P_ava as a function of epsilon in Fig. 4B". 4B --> 4D.

      Done

      4) It would be easier on the reader if the tables were all in one place. It would be even nicer to put the parameters in the figure captions. Or at least N; that one is kind of important.

      Table placement was a Latex issue, which we have now fixed. We also have included links between tables and relevant figures and indicated network size.

      5) What's x_i in Eqs. 7 and 8?

      We added a sentence of explanation. These are the individual observations of avalanche sizes or durations, depending on what is being fit.

      6) The latent variables evolve according to an Ornstein-Uhlenbeck process. But we might equally expect oscillations or non-normal behavior coupling dynamical modes, and these are likely to give different behavior with respect to avalanches. It might be worth commenting on this.

      7) The model assumes a normal distribution of the coupling strengths between the latent variables and the binary units. Discussing the potential effects of different types of random coupling could provide interesting insights.

      Both 6 and 7 are interesting questions. At this point, we could speculate that the main results would be qualitatively unchanged, provided dynamics are sufficiently slow and that the distribution of coupling strengths is sufficiently broad (that is, there is variance in the coupling matrix across individual neurons). Further studies would be needed to make these statements more precise.

      8) In Fig 1, tau_f = 1E4 whereas in Fig 2 tau_f = 5E3. Why the difference?

      For Figure 1, we chose a set of parameters that gave clear scaling. In Figure 2, we saw some value in showing more than one example of scaling, hence different parameters for the examples in Fig 2 than Fig 1. Note that the Fig 1 simulations are represented in Fig. 2 G-J, as the 5-field simulation with tau_F = 1e4.

  2. Jan 2024
    1. Author Response

      eLife assessment

      This study presents a valuable finding on a new role of Foxp3+ regulatory T cells in sensory perception, which may have an impact on our understanding of somatosensory perception. The authors identified a previously unappreciated action of enkephalins released by immune cells in the resolution of pain and several upstream signals that can regulate the expression of the proenkephalin gene PENK in Foxp3+ Tregs. However, whereas the generation of transgenic mice with conditional deletion of PENK in Foxp3+ cells and PENK fate-mapping is novel and generates compelling data, they show an incomplete analysis of Tregs in the control and transgenic mice, proper tamoxifen controls nor the role of PENK+ skin T cells to further support their hypothesis. Nonetheless, the study would be of interest to the biologists working in the field of neuroimmunology and inflammation.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors explore mechanisms through which T-regs attenuate acute pain using a heat sensitivity paradigm. Analysis of available transcriptomic data revealed expression on the proenkephalin (Penk) gene in T-regs. The authors explore the contribution of T-reg Penk in the resolution of heat sensitivity.

      Strengths:

      Investigating the potential role of T-reg Penk in the resolution of acute pain is a strength.

      Weaknesses:

      The overall experimental design is superficial and lacks sufficient rigor to draw any meaningful conclusions.

      For instance:

      1) The were no TAM controls. What is the evidence that TAM does not alter heat-sensitive receptors.

      Author response : By comparing panel A and C, it appears that heat-sensitivity in controls (blue dots) is slightly different before and after TMX administration, suggesting that heat-sensitive receptors are moderately altered by TMX per se. However, heat sensitivity is increased by two fold in KO animals. Thus, a possible effect of TAM on heat receptors is not responsible for the heat hyperalgesia seen in KO, as shown in figure 4 and S3.

      2) There are no controls demonstrating that recombination actually occurred. How do the authors know a single dose of TAM is sufficient?

      Author response : these experiments are in progress. Specificity of the deletion will be presented in an updated version of the manuscript in the near future.

      3) Why was only heat sensitivity assessed? The behavioral tests are inadequate to derive any meaningful conclusions. Further, why wasn't the behavioral data plotted longitudinally

      Author response : We respectfuly point the reviewer to figure S3 where the longitudinal data are presented. New behavorial tests are being performed. The results will be presented in a revised version.

      Reviewer #2 (Public Review):

      Summary:

      The present study addresses the role of enkephalins, which are specifically expressed by regulatory T cells (Treg), in sensory perception in mice. The authors used a combination of transcriptomic databases available online to characterize the molecular signature of Treg. The proenkephalin gene Penk is among the most enriched transcripts, suggesting that Treg plays an analgesic role through the release of endogenous opioids. In addition, in silico analysis suggests that Penk is regulated by the TNFR superfamily; this being experimentally confirmed. Using flow cytometry analysis, the authors then show that Penk is mostly expressed in Treg of the skin and colon, compared to other immune cells. Finally, genetic conditional excision of Penk, selectively in Treg, results in heat hypersensitivity, as assessed by behavior analysis.

      Strengths:

      The manuscript is clear and reveals a previously unappreciated role of enkephalins, as released by immune cells, in sensory perception. The rationale in this manuscript is easy to follow, and conclusions are well supported by data.

      Weaknesses:

      The sensory deficit of Penk cKO appears to be quite limited compared to control littermates.

      Reviewer #3 (Public Review):

      Summary:

      Aubert et al investigated the role of PENK in regulatory T cells. Through the mining of publicly available transcriptome data, the authors confirmed that PENK expression is selectively enriched in regulatory but not conventional T cells. Further data mining suggested that OX40, 4-1BB as well as BATF, can regulate PENK expression in Tregs. The authors generated fate-mapping mice to confirm selective PENK expression in Tregs and activated effector T cells in the colon and spleen. Interestingly, transgenic mice with conditional deletion of PENK in Tregs resulted in hypersensitivity to heat, which the authors attributed to heat hyperalgesia.

      Strengths:

      The generation of transgenic mice with conditional deletion of PENK in foxp3 and PENK fate-mapping is novel and can potentially yield significant findings. The identification of upstream signals that regulate PENK is interesting but unlikely to be the main reason why PENK is predominantly expressed in Tregs as both BATF and TNFR are expressed in effector T cells.

      Weaknesses:

      There is a lack of direct evidence and detailed analysis of Tregs in the control and transgenic mice to support the authors' hypothesis. PENK was previously reported to be expressed in skin Tregs and play a significant role in regulating skin homeostasis: this should be considered as an alternative mechanism that may explain the changed sensitivity to heat observed in the paper.

      Author response : Supplementary figures are being prepared and new results are being collected to show that the KO do not perturb immune and/or skin homeostasis at the time of the experiments. These will be presented in a revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors have developed a compelling coarse-grained simulation approach for nucleosome-nucleosome interactions within a chromatin array. The data presented are solid and provide new insights that allow for predictions of how chromatin interactions might occur in vivo, but some of the claims should be tempered. The tools will be valuable for the chromosome biology field.

      Response: We want to thank the editors and all the reviewers for their insightful comments. We have made substantial changes to the manuscript to improve its clarity and temper necessary claims, as detailed in the responses, and we performed additional analyses to address the reviewers’ concerns. We believe that we have successfully addressed all the comments, and the quality of our paper has improved significantly.

      In the following, we provide point-to-point responses to all the reviewer comments. 

      RESPONSE TO REFEREE 1:

      Comment 0: This study develops and applies a coarse-grained model for nucleosomes with explicit ions. The authors perform several measurements to explore the utility of a coarse-grained simulation method to model nucleosomes and nucleosome arrays with explicit ions and implicit water. ’Explicit ions’ means that the charged ions are modeled as particles in simulation, allowing the distributions and dynamics of ions to be measured. Since nucleosomes are highly charged and modulated by charge modifications, this innovation is particularly relevant for chromatin simulation.

      Response: We thank the reviewer’s excellent summary of the work.

      Comment 1: Strengths: This simulation method produces accurate predictions when compared to experiments for the binding affinity of histones to DNA, counterion interactions, nucleosome DNA unwinding, nucleosome binding free energies, and sedimentation coefficients of arrays. The variety of measured quantities makes both this work and the impact of this coarse-grained methodology compelling. The comparison between the contributions of sodium and magnesium ions to nucleosome array compaction, presented in Figure 3, was exciting and a novel result that this simulation methodology can assess.

      Response: We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank him/her for the detailed suggestions and comments.

      Comment 2: Weaknesses: The presentation of experimental data as representing in vivo systems is a simplification that may misrepresent the results of the simulation work. In vivo, in this context, typically means experimental data from whole cells. What one could expect for in vivo experimental data is measurements on nucleosomes from cell lysates where various and numerous chemical modifications are present. On the contrary, some of the experimental data used as a comparison are from in vitro studies. In vitro in this context means nucleosomes were formed ’in a test tube’ or under controlled conditions that do not represent the complexity of an in vivo system. The simulations performed here are more directly compared to in vitro conditions. This distinction likely impacts to what extent these simulation results are biologically relevant. In vivo and in vitro differences could be clarified throughout and discussed.

      Response: As detailed in Response to Comment 3, we have made numerous modifications in the Introduction, Results, and Discussion Section to emphasize the differences between reconstituted and native nucleosomes. The newly added texts also delve into the utilization of the interaction strength measured for reconstituted nucleosomes as a reference point for conceptualizing the interactions among native nucleosomes.

      Comment 3: In the introduction (pg. 3), the authors discuss the uncertainty of nucleosome-tonucleosome interaction strengths in vivo. For example, the authors discuss works such as Funke et al. However, Funke et al. used reconstituted nucleosomes from recombinant histones with one controlled modification (H4 acetylation). Therefore, this study that the authors discuss is measuring nucleosome’s in vitro affinity, and there could be significant differences in vivo due to various posttranslational modifications. Please revise the introduction, results section ”Close contacts drive nucleosome binding free energy,” and discussion to reflect and clarify the difference between in vitro and in vivo measurements. Please also discuss how biological variability could impact your findings in vivo. The works of Alexey Onufriev’s lab on the sensitivity of nucleosomes to charge changes (10.1016/j.bpj.2010.06.046, 10.1186/s13072-018-0181-5), such as some PTMs, are one potential starting place to consider how modifications alter nucleosome stability in vivo.

      Response: We thank the reviewer for the insightful comments and agree that native nucleosomes can differ from reconstituted nucleosomes due to the presence of histone modifications.

      We have revised the introduction to emphasize the differences between in vitro and in vivo nucleosomes. The new text now reads

      "The relevance of physicochemical interactions between nucleosomes to chromatin organization in vivo has been constantly debated, partly due to the uncertainty in their strength [cite]. Examining the interactions between native nucleosomes poses challenges due to the intricate chemical modifications that histone proteins undergo within the nucleus and the variations in their underlying DNA sequences [cite]. Many in vitro experiments have opted for reconstituted nucleosomes that lack histone modifications and feature wellpositioned 601-sequence DNA to simplify the chemical complexity. These experiments aim to establish a fundamental reference point for understanding the strength of interactions within native nucleosomes. Nevertheless, even with reconstituted nucleosomes, a consensus regarding the significance of their interactions remains elusive. For example, using force-measuring magnetic tweezers, Kruithof et al. estimated the inter-nucleosome binding energy to be ∼ 14 kBT [cite]. On the other hand, Funke et al. introduced a DNA origamibased force spectrometer to directly probe the interaction between a pair of nucleosomes [cite], circumventing any potential complications from interpretations of single molecule traces of nucleosome arrays. Their measurement reported a much weaker binding free energy of approximately 2 kBT. This large discrepancy in the reported reference values complicates a further assessment of the interactions between native nucleosomes and their contribution to chromatin organization in vivo."

      We modified the first paragraph of the results section to read

      "Encouraged by the explicit ion model’s accuracy in reproducing experimental measurements of single nucleosomes and nucleosome arrays, we moved to directly quantify the strength of inter-nucleosomes interactions. We once again focus on reconstituted nucleosomes for a direct comparison with in vitro experiments. These experiments have yielded a wide range of values, ranging from 2 to 14 kBT [cite]. Accurate quantification will offer a reference value for conceptualizing the significance of physicochemical interactions among native nucleosomes in chromatin organization in vivo."

      New text was added to the Discussion Section to emphasize the implications of simulation results for interactions among native nucleosomes.

      "One significant finding from our study is the predicted strong inter-nucleosome interactions under the physiological salt environment, reaching approximately 9 kBT. We showed that the much lower value reported in a previous DNA origami experiment is due to the restricted nucleosomal orientation inherent to the device design. Unrestricted nucleosomes allow more close contacts to stabilize binding. A significant nucleosome binding free energy also agrees with the high forces found in single-molecule pulling experiments that are needed for chromatin unfolding [cite]. We also demonstrate that this strong inter-nucleosomal interaction is largely preserved at longer nucleosome repeat lengths (NRL) in the presence of linker histone proteins. While posttranslational modifications of histone proteins may influence inter-nucleosomal interactions, their effects are limited, as indicated by Ding et al. [cite], and are unlikely to completely abolish the significant interactions reported here. Therefore, we anticipate that, in addition to molecular motors, chromatin regulators, and other molecules inside the nucleus, intrinsic inter-nucleosome interactions are important players in chromatin organization in vivo."

      The suggested references (10.1016/j.bpj.2010.06.046, 10.1186/s13072-018-0181-5) are now included as citations # 44 and 45.

      Comment 4: Due to the implicit water model, do you know if ions can penetrate the nucleosome more? For example, does the lack of explicit water potentially cause sodium to cluster in the DNA grooves more than is biologically relevant, as shown in Figure 1?

      Response: We thank the reviewer for the insightful comments. The parameters of the explicit-ion model were deduced from all-atom simulations and fine-tuned to replicate crucial aspects of the local ion arrangements around DNA (1). The model’s efficacy was demonstrated in reproducing the radial distribution function of Na+ and Mg2+ ion distributions in the proximity of DNA (see Author response image 1). Consequently, the number of ions near DNA in the coarse-grained models aligns with that observed in all-atom simulations, and we do not anticipate any significant, unphysical clustering. It is worth noting that previous atomistic simulations have also reported the presence of a substantial quantity of Na+ ions in close proximity to nucleosomal DNA (refer to Author response image 2).

      Author response image 1.

      Comparison between the radial distribution functions of Na+ (left) and Mg2+ (right) ions around the DNA phosphate groups computed from all-atom (black) and coarse-grained (red) simulations. Figure adapted from Figure 4 of Ref. 1. The coarse-grained explicit ion model used in producing the red curves is identical to the one presented in the current manuscript.

      (© 2011, AIP Publishing. This figure is reproduced with permission from Figure 4 in Freeman GS, Hinckley DM, de Pablo JJ (2011) A coarse-grain three-site-pernucleotide model for DNA with explicit ions. The Journal of Chemical Physics 135:165104. It is not covered by the CC-BY 4.0 license and further reproduction of this figure would need permission from the copyright holder.)

      Author response image 2.

      Three-dimensional distribution of sodium ions around the nucleosome determined from all-atom explicit solvent simulations. Darker blue colors indicate higher sodium density and high density of sodium ions around the DNA is clearly visible. The crystallographically identified acidic patch has been highlighted as spheres on the surface of the histone core and a high level of sodium condensation is observed around these residues. Figure adapted from Ref. 2.

      (© 2009, American Chemical Society. This figure is reproduced with permission from Figure 7 in Materese CK, Savelyev A, Papoian GA (2009) Counterion Atmosphere and Hydration Patterns near a Nucleosome Core Particle. J. Am. Chem. Soc. 131:15005–15013.. It is not covered by the CC-BY 4.0 license and further reproduction of this figure would need permission from the copyright holder.)

      Comment 5: Histone side chain to DNA interactions, such as histone arginines to DNA, are essential for nucleosome stability. Therefore, can the authors provide validation or references supporting your model of the nucleosome with one bead per amino acid? I would like to see if the nucleosomes are stable in an extended simulation or if similar dynamic motions to all-atom simulations are observed.

      Response: The nucleosome model, which employs one bead per amino acid and lacks explicit ions, has undergone extensive calibration and has found application in numerous prior studies. For instance, the de Pablo group utilized a similar model to showcase its ability to accurately replicate the experimentally measured nucleosome unwinding free energy penalty (3), sequence-dependent nucleosome sliding (4), and the interaction between two nucleosomes (5). Similarly, the Takada group employed a comparable model to investigate acetylation-modulated tri-nucleosome structures (6), chromatin structures influenced by chromatin factors (7), and nucleosome sliding (8). Our group also employed this model to study the structural rearrangement of a tetranucleosome (9) and the folding of larger chromatin systems (10). In cases where data were available, simulations frequently achieved quantitative reproduction of experimental results.

      We added the following text to the manuscript to emphasize previous studies that validate the model accuracy.

      "We observe that residue-level coarse-grained models have been extensively utilized in prior studies to examine the free energy penalty associated with nucleosomal DNA unwinding [cite], sequence-dependent nucleosome sliding [cite], binding free energy between two nucleosomes [cite], chromatin folding [cite], the impact of histone modifications on tri-nucleosome structures [cite], and protein-chromatin interactions [cite]. The frequent quantitative agreement between simulation and experimental results supports the utility of such models in chromatin studies. Our introduction of explicit ions, as detailed below, further extends the applicability of these models to explore the dependence of chromatin conformations on salt concentrations."

      We agree that arginines are important for nucleosome stability. Since we assign positive charges to these residues, their contribution to DNA binding can be effectively captured. The model’s ability in reproducing nucleosome stability is supported by the good agreement between the simulated free energy penalty associated with nucleosomal DNA unwinding and experimental value estimated from single molecule experiments (Figure 1).

      To further evaluate nucleosome stability in our simulations, we conducted a 200-ns-long simulation of a nucleosome featuring the 601-sequence under physiological salt conditions– 100 mM NaCl and 0.5 mM MgCl2, consistent with the conditions in Figure 1 of the main text. We found that the nucleosome maintains its overall structure during this simulation. The nucleosome’s radius of gyration (Rg) remained proximate to the value corresponding to the PDB structure (3.95 nm) throughout the entire simulation period (see Author response image 3).

      Author response image 3.

      Time trace of the radius of gyration (Rg) of a nucleosome with the 601-sequence along an unbiased, equilibrium trajectory. It is evident the Rg fluctuates around the value found in the PDB structure (3.95 nm), supporting the stability of the nucleosome in our simulation.

      Occasional fluctuations in Rg corresponded to momentary, partial unwrapping of the nucleosomal DNA, a phenomenon observed in single-molecule experiments. However, we advise caution due to the coarse-grained nature of our simulations, which prevents a direct mapping of simulation timescale to real time. Importantly, the rate of DNA unwrapping in our simulations is notably overestimated.

      It’s plausible that coarse-grained models, lacking side chains, might underestimate the barrier for DNA sliding along the nucleosome. Specifically, our model, without differentiation between interactions among various amino acids and nucleotides, accurately reproduces the average nucleosomal DNA binding affinity but may not capture the energetic variations among binding interfaces. Since sliding’s contribution to chromatin organization is minimal due to the use of strongly positioning 601 sequences, we imposed rigidity on the two nucleotides situated at the dyad axis to prevent nucleosomal DNA sliding. In future studies, enhancing the calibration of protein-DNA interactions to achieve improved sequence specificity would be an intriguing avenue. To underscore this limitation of the model, we have included the following text in the discussion section of the main text.

      "Several aspects of the coarse-grained model presented here can be further improved. For instance, the introduction of specific protein-DNA interactions could help address the differences in non-bonded interactions between amino acids and nucleotides beyond electrostatics [cite]. Such a modification would enhance the model’s accuracy in predicting interactions between chromatin and chromatin-proteins. Additionally, the single-bead-per-amino-acid representation used in this study encounters challenges when attempting to capture the influence of histone modifications, which are known to be prevalent in native nucleosomes. Multiscale simulation approaches may be necessary [cite]. One could first assess the impact of these modifications on the conformation of disordered histone tails using atomistic simulations. By incorporating these conformational changes into the coarse-grained model, systematic investigations of histone modifications on nucleosome interactions and chromatin organization can be conducted. Such a strategy may eventually enable the direct quantification of interactions among native nucleosomes and even the prediction of chromatin organization in vivo."

      Comment 6: The solvent salt conditions vary in the experimental reference data for internucleosomal interaction energies. The authors note, for example, that the in vitro data from Funke et al. differs the most from other measurements, but the solvent conditions are 35 mM NaCl and 11 mM MgCl2. Since this simulation method allows for this investigation, could the authors speak to or investigate if solvent conditions are responsible for the variability in experimental reference data? The authors conclude on pg. 8-9 and Figure 4 that orientational restraints in the DNA origami methodology are responsible for differences in interaction energy. Can the authors rule out ion concentration contributions?

      Response: We thank the reviewer for the insightful comment. We would like to clarify that the black curve presented in Figure 4B of the main text was computed using the salt concentration specified by Funke et al. (35 mM NaCl and 11 mM MgCl2). Furthermore, there were no restraints placed on nucleosome orientations during these calculations. Consequently, the results in Figure 4B can be directly compared with the black curve in Figure 5C. The data in Figure 5C were calculated under physiological salt conditions (150 mM NaCl and 2 mM MgCl2), which are the standard solvent salt conditions used in most studies. It is worth noting that the free energy of nucleosome binding is significantly higher at the salt concentration employed by Funke et al. (14 kBT) than the value at the physiological salt condition (9 kBT). Therefore, comparing the results in Figure 4B and 5C eliminates ion concentration conditions as a potential cause for the the almost negligible result reported by Funke et al.

      Comment 7: In the discussion on pg. 12 residual-level should be residue-level.

      Response: We apologize for the oversight and have corrected the grammatical error in our manuscript.

      RESPONSE TO REFEREE 2:

      Comment 0: In this manuscript, the authors introduced an explicit ion model using the coarse-grained modelling approach to model the interactions between nucleosomes and evaluate their effects on chromatin organization. The strength of this method lies in the explicit representation of counterions, especially divalent ions, which are notoriously difficult to model. To achieve their aims and validate the accuracy of the model, the authors conducted coarse-grained molecular dynamics simulations and compared predicted values to the experimental values of the binding energies of protein-DNA complexes and the free energy profile of nucleosomal DNA unwinding and inter-nucleosome binding. Additionally, the authors employed umbrella sampling simulations to further validate their model, reproducing experimentally measured sedimentation coefficients of chromatin under varying salt concentrations of monovalent and divalent ions.

      Response: We thank the reviewer’s excellent summary of the work.

      Comment 1: The significance of this study lies in the authors’ coarse-grained model which can efficiently capture the conformational sampling of molecules while maintaining a low computational cost. The model reproduces the scale and, in some cases, the shape of the experimental free energy profile for specific molecule interactions, particularly inter-nucleosome interactions. Additionally, the authors’ method resolves certain experimental discrepancies related to determining the strength of inter-nucleosomal interactions. Furthermore, the results from this study support the crucial role of intrinsic physicochemical interactions in governing chromatin organization within the nucleus.

      Response: We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank him/her for the detailed suggestions and comments.

      Comment 2: The method is simple but can be useful, given the authors can provide more details on their ion parameterization. The paper says that parameters in their ”potentials were tuned to reproduce the radial distribution functions and the potential of mean force between ion pairs determined from all-atom simulations.” However, no details on their all-atom simulations were provided; at some point, the authors refer to Reference 67 which uses all-atom simulations but does not employ the divalent ions. Also, no explanation is given for their modelling of protein-DNA complexes.

      Response: We appreciate the reviewer’s suggestion on clarifying the parameterization of the explicition model. The parameterization was not carried out in reference 67 nor by us, but by the de Pablo group in citation 53. Specifically, ion potentials were parameterized to fit the potential of mean force between both monovalent and divalent ion pairs, calculated either from all-atom simulations or from the literature. The authors carried out extensive validations of the model parameters by comparing the radial distribution functions of ions computed using the coarse-grained model with those from all-atom simulations. Good agreements between coarse-grained and all-atom results ensure that the parameters’ accuracy in reproducing the local structures of ion interactions.

      To avoid confusion, we have revised the text from:

      "Parameters in these potentials were tuned to reproduce the radial distribution functions and the potential of mean force between ion pairs determined from all-atom simulations."

      to

      "Parameters in these potentials were tuned by Freeman et al. [cite] to reproduce the radial distribution functions and the potential of mean force between ion pairs determined from all-atom simulations."

      We modified the Supporting Information at several places to clarify the setup and interpretation of protein-DNA complex simulations.

      For example, we clarified the force fields used in these simulation with the following text

      "All simulations were carried out using the software Lammps [cite] with the force fields defined in the previous two sections."

      We added details on the preparation of these simulations as follows

      "We carried out a series of umbrella-sampling simulations to compute the binding free energies of a set of nine protein-DNA complexes with experimentally documented binding dissociation constants [cite]. Initial configurations of these simulations were prepared using the crystal structures with the corresponding PDB IDs listed in Fig. S1."

      We further revised the caption of Figure S1 (included as Author response image 4) to facilitate the interpretation of simulation results.

      Author response image 4.

      The explicit-ion model predicts the binding affinities of protein-DNA complexes well, related to Fig. 1 of the main text. Experimental and simulated binding free energies are compared for nine protein-DNA complexes [cite], with a Pearson Correlation coefficient of 0.6. The PDB ID for each complex is indicated in red, and the diagonal line is drawn in blue. The significant correlation between simulated and experimental values supports the accuracy of the model. To further enhance the agreement between the two, it will be necessary to implement specific non-bonded interactions that can resolve differences among amino acids and nucleotides beyond simple electrostatics. Such modifications will be interesting avenues for future research. See text Section: Binding free energy of protein-DNA complexes for simulation details.

      Comment 3: Overall, the paper is well-written, concise and easy to follow but some statements are rather blunt. For example, the linker histone contribution (Figure 5D) is not clear and could be potentially removed. The result on inter-nucleosomal interactions and comparison to experimental values from Ref#44 is the most compelling. It would be nice to see if the detailed shape of the profile for restrained inter-nucleosomal interactions in Figure 4B corresponds to the experimental profile. Including the dependence of free energy on a vertex angle would also be beneficial.

      Response: We thank the reviewer for the comments and agree that the discussion on linker histone results was brief. However, we believe the results are important and demonstrate our model’s advantage over mesoscopic approaches in capturing the impact of chromatin regulators on chromatin organization.

      Therefore, instead of removing the result, we expanded the text to better highlight its significance, to help its comprehension, and to emphasize its biological implications. The image in Figure 5D was also redesigned to better visualize the cross contacts between nucleosomes mediated by histone H1. The added texts are quoted as below, and the new Figure 5 is included.

      Author response image 5.

      Revised main text Figure 5, with Figure 5D modified for improved visual clarity.

      "Importantly, we found that the weakened interactions upon extending linker DNA can be more than compensated for by the presence of histone H1 proteins. This is demonstrated in Fig. 5C and Fig. S8, where the free energy cost for tearing part two nucleosomes with 167 bp DNA in the presence of linker histones (blue) is significantly higher than the curve for bare nucleosomes (red). Notably, at larger inter-nucleosome distances, the values even exceed those for 147 bp nucleosomes (black). A closer examination of the simulation configurations suggests that the disordered C-terminal tail of linker histones can extend and bind the DNA from the second nucleosome, thereby stabilizing the internucleosomal contacts (as shown in Fig. 5D). Our results are consistent with prior studies that underscore the importance of linker histones in chromatin compaction [cite], particularly in eukaryotic cells with longer linker DNA [cite]."

      We further compared the simulated free energy profile, depicting the center of mass distance between nucleosomes, with the experimental profile, as depicted in Author response image 6. The agreement between the simulated and experimental results is evident. The nuanced features observed between 60 to 80 Ain the simulated profile stem from DNA unwinding˚ to accommodate the incoming nucleosome, creating a small energy barrier. It’s worth noting that such unwinding is unlikely to occur in the experimental setup due to the hybridization method used to anchor nucleosomes onto the DNA origami. Moreover, our simulation did not encompass configurations below 60 A, resulting in a lack of data in˚ that region within the simulated profile.

      We projected the free energy profile onto the vertex angle of the DNA origami device, utilizing the angle between two nucleosome faces as a proxy. Once more, the simulated profile demonstrates reasonable agreement with the experimental data (Author response image 6). Author response image 6 has been incorporated as Figure S4 in the Supporting Information.

      Author response image 6.

      Explicit ion modeling reproduces the experimental free energy profiles of nucleosome binding. (A) Comparison between the simulated (black) and experimental (red) free energy profile as a function of the inter-nucleosome distance. Error bars were computed as the standard deviation of three independent estimates. The barrier observed between 60A and 80˚ A arises from the unwinding of nucleosomal DNA when the two nu-˚ cleosomes are in close proximity, as highlighted in the orange circle. (B) Comparison between the simulated (black) and experimental (red) free energy profile as a function of the vertex angle. Error bars were computed as the standard deviation of three independent estimates. (C) Illustration of the vertex angle Φ used in panel (B).

      Comment 4: Another limitation of this study is that the authors’ model sacrifices certain atomic details and thermodynamic properties of the modelled systems. The potential parameters of the counter ions were derived solely by reproducing the radial distribution functions (RDFs) and potential of mean force (PMF) based on all-atom simulations (see Methods), without considering other biophysical and thermodynamic properties from experiments. Lastly, the authors did not provide any examples or tutorials for other researchers to utilize their model, thus limiting its application.

      Response: We agree that residue-level coarse-grained modeling indeed sacrifices certain atomistic details. This sacrifice can be potentially limiting when studying the impact of chemical modifications, especially on histone and DNA methylations. We added a new paragraph in the Discussion Section to point out such limitations and the relevant text is quoted below.

      "Several aspects of the coarse-grained model presented here can be further improved. For instance, the introduction of specific protein-DNA interactions could help address the differences in non-bonded interactions between amino acids and nucleotides beyond electrostatics [cite]. Such a modification would enhance the model’s accuracy in predicting interactions between chromatin and chromatin-proteins. Additionally, the single-bead-per-amino-acid representation used in this study encounters challenges when attempting to capture the influence of histone modifications, which are known to be prevalent in native nucleosomes. Multiscale simulation approaches may be necessary [cite]. One could first assess the impact of these modifications on the conformation of disordered histone tails using atomistic simulations. By incorporating these conformational changes into the coarse-grained model, systematic investigations of histone modifications on nucleosome interactions and chromatin organization can be conducted. Such a strategy may eventually enable the direct quantification of interactions among native nucleosomes and even the prediction of chromatin organization in vivo."

      Nevertheless, it’s important to note that while the model sacrifices accuracy, it compensates with superior efficiency. Atomistic simulations face significant challenges in conducting extensive free energy calculations required for a quantitative evaluation of ion impacts on chromatin structures.

      The explicit ion model, introduced by the de Pablo group, follows a standard approach adopted by other research groups, such as the parameterization of ion models using the potential of mean force from atomistic simulations (11; 12). According to multiscale coarse-graining theory, reproducing potential mean force (PMF) enables the coarsegrained model to achieve thermodynamic consistency with the atomistic model, ensuring identical statistical properties derived from them. However, it’s crucial to recognize that an inherent limitation of such approaches is their dependence on the accuracy of atomistic force fields in reproducing thermodynamic properties from experiments, as any inaccuracies in the atomistic force fields will similarly affect the resulting coarse-grained (CG) model.

      We have provided the implementation of CG model and detailed instructions on setting up and performing simulations GitHub repository. Examples include simulation setup for a protein-DNA complex and for a nucleosome with the 601-sequence.

      References [1] Freeman GS, Hinckley DM, de Pablo JJ (2011) A coarse-grain three-site-pernucleotide model for DNA with explicit ions. The Journal of Chemical Physics 135:165104.

      [2] Materese CK, Savelyev A, Papoian GA (2009) Counterion Atmosphere and Hydration Patterns near a Nucleosome Core Particle. J. Am. Chem. Soc. 131:15005–15013.

      [3] Lequieu J, Cordoba A, Schwartz DC, de Pablo JJ´ (2016) Tension-Dependent Free Energies of Nucleosome Unwrapping. ACS Cent. Sci. 2:660–666.

      [4] Lequieu J, Schwartz DC, De Pablo JJ (2017) In silico evidence for sequence-dependent nucleosome sliding. Proc. Natl. Acad. Sci. U.S.A. 114.

      [5] Moller J, Lequieu J, de Pablo JJ (2019) The Free Energy Landscape of Internucleosome Interactions and Its Relation to Chromatin Fiber Structure. ACS Cent. Sci. 5:341–348.

      [6] Chang L, Takada S (2016) Histone acetylation dependent energy landscapes in trinucleosome revealed by residue-resolved molecular simulations. Sci Rep 6:34441.

      [7] Watanabe S, Mishima Y, Shimizu M, Suetake I, Takada S (2018) Interactions of HP1 Bound to H3K9me3 Dinucleosome by Molecular Simulations and Biochemical Assays. Biophysical Journal 114:2336–2351.

      [8] Brandani GB, Niina T, Tan C, Takada S (2018) DNA sliding in nucleosomes via twist defect propagation revealed by molecular simulations. Nucleic Acids Research 46:2788–2801.

      [9] Ding X, Lin X, Zhang B (2021) Stability and folding pathways of tetra-nucleosome from six-dimensional free energy surface. Nat Commun 12:1091.

      [10] Liu S, Lin X, Zhang B (2022) Chromatin fiber breaks into clutches under tension and crowding. Nucleic Acids Research 50:9738–9747.

      [11] Savelyev A, Papoian GA (2010) Chemically accurate coarse graining of doublestranded DNA. Proc. Natl. Acad. Sci. U.S.A. 107:20340–20345.

      [12] Noid WG (2013) Perspective: Coarse-grained models for biomolecular systems. The Journal of Chemical Physics 139:090901.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Response to Reviewer Comments:

      We thank the editors and reviewers for their careful consideration of our revised manuscript. Reviewers 2 and 3 indicated that their previous comments had been satisfactorily addressed by our revisions. Reviewer 1 raised several points and our point by point responses can be found below.

      Reviewer #1 (Recommendations For The Authors):

      1) Please clarify the terminology of spontaneous recovery in your study.

      According to Rescorla RA 2004 ( http://www.learnmem.org/cgi/doi/10.1101/lm.77504.), he defines spontaneous recovery as "with the passage of time following nonreinforcement, there is some "spontaneous recovery" of the initially learned behavior. ". So in this study, I thought Test2 is spontaneous recovery while the Test1 is extinction test as most studies do. But authors seem to define spontaneous recovery from the last trial of Extinction3 to the first trial of Test1, which is confusing to me.

      We agree with the reviewer (and Rescorla, 2004) that spontaneous recovery is defined as the return of the initially learned behaviour after the passage of time. In our study, Test 1 is conducted 24-hours after the final extinction session (Extinction 3) and in our view, the return of responding following that 24-hour delay can be considered spontaneous recovery. Rescorla (2004 and elsewhere) also points out that the magnitude of spontaneous recovery may be greater with larger delays between extinction and testing. This in part motivated our second test 7 days following the last extinction session with optogenetic manipulation. We did not find evidence of greater spontaneous recovery in the test 7 days later, however, the additional extinction trials in Test 1 may have reduced the opportunity to detect such an effect.

      2) Why are E6-8 plots of Offset group in Figure 3E and F different?

      We apologise for this error and have corrected it. This was an artifact of an older version of the figure before final exclusions. The E6-8 data is now the same for panels 2E and 2F.

      3) Related to 2, Please clarify what type of data they are in Figure3E,F Figure5H, and I . If it's average, please add error bars. Also, it's hard to see the statistical significance at the current figure style.

      The data in these panels are the mean lever presses per trial as labeled on the y-axis of the figures. In our view, in this instance, error bars (or lines and other markers of significance) detract from the visual clarity of the figure. The statistical approach and outcomes are included in the figure legend and when presented alongside the figure in the final version of the paper should directly clarify these points.

      Reviewer #2 (Recommendations For The Authors):

      The authors have addressed my previous comments to my satisfaction.

      Reviewer #3 (Recommendations For The Authors):

      The authors have adequately addressed each of the points raised in my original review. The paper will make a nice contribution to the field.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      • It would be interesting if the authors would do calcium imaging or electrophysiology from LCNA neurons during appetitive extinction.

      Indeed these are interesting ideas. We have plans to pursue them but ongoing work is not yet ready for publication.

      • LC-NA neuronal responses during the omission period seem to be important for appetitive extinction as described in the manuscript (Park et al., 2013; Sara et al., 1994; Su & Cohen 2022). It would be nice to activate/inactivate LC-NA neurons during the omission period.

      Optogenetic manipulation was given for the duration of the stimulus (20 seconds; when reward should be expected contingent upon performance of the instrumental response). We believe the reviewer is suggesting briefer manipulation only at the precise time the pellet would have been expected but omitted. If so, the implementation of that is complex because animals were trained on random ratio schedules and so when exactly the pellet(s) was earned was variable and so when precisely the animal experiences “omission” is difficult to know with better temporal specificity than used in the current experiments. But we agree with the reviewer that now we see that there is an effect of LC manipulation, in future studies we could alter the behavioral task so that the timing of reward is consistent (e.g., train the animals with fixed ratio schedules or continuous reinforcement, or use a Pavlovian paradigm) where a reasonable assertion about when the outcome should occur, and thus when its absence would be detected, can be made and then manipulation given at that time to address this point.

      • Does LC-NA optoinhibition affect the expression of the conditioned response (the lever presses at early trials of Extinction 1)? It's hard to see this from the average of all trials.

      The eNpHR group responded numerically less overall during extinction. This effect appears greatest in the first extinction session, but fails to reach statistical significance [F(1,15)= 3.512, p=0.081]. Likewise, analysis of the trial by trial data for the first extinction session failed to reveal any group differences [F(1,15)= 3.512, p=0.081] or interaction [trial x group; F(1,15)=0.550, p=0.470].

      Comparison of responding in the first trial also failed to reveal group differences [F(1.15)=1.209, p=0.289]. Thus while there is a trend in the data, this is not borne out by the statistical analysis, even in early trials of the session.

      • While the authors manipulate global LC-NA neurons, many people find the heterogeneous populations in the LC. It would be great if the authors could identify the subpopulation responsible for appetitive extinction.

      We agree that it would be exciting to test whether and identify which subpopulation(s) of cells or pathway(s) are responsible for appetitive extinction. While related work has found that discrete populations of LC neurons mediate different behaviours and states, and may even have opposing effects, our initial goal was to determine whether the LC was involved in appetitive extinction learning. These are certainly ideas we hope to pursue in future work.

      Minor:

      • Why do the authors choose 10Hz stimulation?

      The stimulation parameters were based on previously published work. We have added these citations to the manuscript.

      Quinlan MAL, Strong VM, Skinner DM, Martin GM, Harley CW, Walling SG. Locus Coeruleus Optogenetic Light Activation Induces Long-Term Potentiation of Perforant Path Population Spike Amplitude in Rat Dentate Gyrus. Front Syst Neurosci. 2019 Jan 9;12:67. doi: 10.3389/fnsys.2018.00067. PMID: 30687027; PMCID: PMC6333706.

      Glennon E, Carcea I, Martins ARO, Multani J, Shehu I, Svirsky MA, Froemke RC. Locus coeruleus activation accelerates perceptual learning. Brain Res. 2019 Apr 15;1709:39-49. doi: 10.1016/j.brainres.2018.05.048. Epub 2018 May 31. PMID: 29859972; PMCID: PMC6274624.

      Vazey EM, Moorman DE, Aston-Jones G. Phasic locus coeruleus activity regulates cortical encoding of salience information. Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):E9439-E9448. doi: 10.1073/pnas.1803716115. Epub 2018 Sep 19. PMID: 30232259; PMCID: PMC6176602.

      • The authors should describe the behavior task before explaining Fig1e-g results.

      We agree that introducing the task earlier would improve clarity and have added a brief summary of the task at the beginning of the results section (before reference to Figure 1) and point the reader to the schematics that summarize training for each experiment (Figures 2A and 4D).

      NOTE R2 includes specific comments in their Public review. We have considered those as their recommendations and address them here.

      1) In such discrimination training, Pavlovian (CS-Food) and instrumental (LeverPress-Food) contingencies are intermixed. It would therefore be very interesting if the authors provided evidence of other behavioural responses (e.g. magazine visits) during extinction training and tests.

      In a discriminated operant procedure, the DS (e.g. clicker) indicates when the instrumental response will be reinforced (e.g., lever-pressing is reinforced only when the stimulus is present, and not when the stimulus is absent). This is distinct from something like a Pavlovianinstrumental transfer procedure and so we wish to just clarify that there is no Pavlovian phase where the stimuli are directly paired with food. After a successful lever-press the rat must enter the magazine to collect the food, but food is only delivered contingency upon lever-pressing and so magazine entries here are not a clear indicator of Pavlovian learning as they may be in other paradigms.

      Nonetheless, we have compiled magazine entry data which although not fully independent of the lever-press response in this paradigm, still tells us something about the animals’ expectation regarding reward delivery.

      For the ChR2 experiment, largely paralleling the results seen in the lever-press data, there were no group differences in magazine responses at the end of training [F(2,40)=2.442, p=0.100].

      Responding decreased across days of extinction (when optogenetic stimulation was given) [F(2, 80)=38.070, p<0.001], but there was no effect of group [F(2,40)=0.801, p=0.456] and no interaction between day and group [F(4,40)=1.461, p=0.222]. Although a similar pattern is seen in the test data, group differences were not statistically different in the first [F(2,40)=2.352, p=0.108] or second [F(2,40)=1.900, p=0.166] tests, perhaps because magazine responses were quite low. Thus, overall, magazine data do not present a different picture than lever-pressing, but because of the lack of statistical effects during testing, we have chosen not to include these data in the manuscript.

      For the eNpHR experiment, again a similar pattern to lever-pressing was seen. There were no group differences at the end of acquisition [F(1,15)=0.290, p=0.598]. Responding decreased across days of extinction [F(2, 30)=4.775, p=0.016] but there was no main effect of group [F(1,15)=1.188, p=0.293], and no interaction between extinction and group [F(2,30)=0.070, p=0.932]. There were no group differences in the number of magazine entries in Test 1 [F(1,15)=1.378, p=0.259] or Test 2 [F(1,15)=0.319, p=0.580].

      Author response image 1.

      Author response image 2.

      2) In Figure 1, the authors show the behavioural data of the different groups of control animals which were later collapsed in a single control group. It would be very nice if the authors could provide the data for each step of the discrimination training.

      We are a little confused by this comment. Figure 1, panels E, F, and G show the different control groups at the end of training, for each day of extinction (when manipulations occurred) and for each test, respectively. It’s not clear if there is an additional step the reviewer is interested in? We note neural manipulation only occurred during extinction sessions.

      We chose to compare the control groups initially, and finding no differences, to collapse them for subsequent analyses as this simplifies the statistical analysis substantially; when group differences are found, each of the subgroups has to be investigated (including the different controls means there are 5 groups instead of 3). It doesn’t change the story because we tested that there were not differences between controls before collapsing them, but collapsing the controls makes the presentation of the statistical data much shorter and easier to follow.

      3) Inspection of Figures 2C & 2D shows that responding in control animals is about the same at test 2 as at the end of extinction training. Therefore, could the authors provide evidence for spontaneous recovery in control animals? This is of importance given that the main conclusion of the authors is that LC stimulation during extinction training led to an increased expression of extinction memory as expressed by reduced spontaneous recovery.

      To address this we have added analyses of trial data, specifically comparison of the final 3 trials of extinction to the subsequent three trials of each test. These analyses are included on page 5 of the manuscript and additional data figures can be found as panels 2E and 2F and pasted below.

      What we observe in the trial data for controls is an increase in responding from the end of extinction to the beginning of each test, thus demonstrating spontaneous recovery. Importantly, responding in the ChR2 group does not increase from the end of extinction to the beginning of the test, illustrating that LC stimulation during extinction prevents spontaneous recovery.

      Comparison of the final three trials of Extinction to the three trials of Test 1:

      Author response image 3.

      Comparison of the final three trials of Extinction to the three trials of Test 2:

      Author response image 4.

      Halorhodopsin Experiment Tests 1 and 2, respectively.

      Author response image 5.

      4) Current evidence suggests that there are differences in LC/NA system functioning between males and females. Could the authors provide details about the allocation of male and female animals in each group?

      More females had surgical complications (excess bleeding) than males resulting in the following allocations; control group; 14 males and 8 females; ChR2 group 8 males and 7 females; offset 6 males.

      In our dataset, we did not detect sex differences in training [no main effect of sex: F(1,38)=1.097, p=0.302, sex x group interaction: F(1,38)= 1.825, p=0.185], extinction [no effect of sex; F(1,38)=0.370, p=0.547; no sex x extinction interaction: F(2,76)=0.701, p=0.499 ; no sex x extinction x group interaction: F(2,76)=2.223, p=0.115] or testing [Test 1 no effect of sex: F(1,38)=1.734, =0.196; no sex x group interaction: F(1,38)=0.009, p=0.924; Test 2 no effect of sex: F(1,38)=0.661, p=0.421; no sex x group interaction: F(1,38)=0.566, p=0.456].

      5) The histology section in both experiments looks a bit unsatisfying. Could the authors provide more details about the number of counted cells and also their distribution along the anteroposterior extent of the LC. Could the authors also take into account the sex in such an analysis?

      The antero-posterior coordinates used for cell counts and calculation of % infection rates were between -9.68 and -10.04 (Paxinos and Watson, 2007, 6th Edition) as infection rates were most consistent in this region and it was well-positioned relative to the optic probe although TH and mCherry positive cells were observed both rostral and caudal to this area. For each animal, an average of ~116+/- 25 TH-positive LC neurons as determined by DAPI and GFP positive cells were identified. Viral expression was identified by colocalized mCherry staining. Animals that did not have viral expression in the LC were not included in the experimental groups. We have added these details to the histology results on page 4.

      Males and females showed very similar infection rates (Males, 74%; Females, 72%). While sex differences, such as total number of LC cells or total LC volume have been reported (Guillamon, A. et al. 2005), Garcia-Falgueras et al. (2005) reported no differences in LC volume or number of LC neurons between male and female Long-Evans rats. So while differences may exist in the LC of Long-Evans rats, the cell counts here were comparable between groups (males, 103 +/- 27; females, 129 +/- 17; t-test, p>0.05).

      References:

      1) Garcia-Falgueras, A., Pinos, H., Collado, P., Pasaro, E., Fernandez, R., Segovia, S., & Guillamon, A. (2005). The expression of brain sexual dimorphism in artificial selection of rat strains. Brain Research, 1052(2), 130–138. https://doi.org/10.1016/j.brainres.2005.05.066

      2) Guillamon, A., De Bias, M. R., & Segovia, S. (1988). Effects of sex steroids on the of the locus coeruleus in the rat. Developmental Brain Research, 40, 306–310.

      Reviewer #3 (Recommendations For The Authors):

      MAJOR

      1) It is worth noting that responding in Group ChR2 decreased from Extinction 3 to Test 1, while responding in the other two groups appears to have remained the same. This suggests that there was no spontaneous recovery of responding in the controls; and, as such, something more must be said about the basis of the between-group differences in responding at test. This is particularly important as each extinction session involved eight presentations of the to-betested stimulus, whereas the test itself consisted of just three stimulus presentations. Hence, comparing the mean levels of performance to the stimulus across its extinction and testing overestimates the true magnitude of spontaneous recovery, which is simply not clear in the results of this study. That is, it is not clear that there is any spontaneous recovery at all and, therefore, that the basis of the difference between Group ChR2 and controls at test is in terms of spontaneous recovery.

      The reviewer is correct that there were a different number of trials in extinction vs. test sessions making direct comparison difficult and displaying the data as averages of the test session does not demonstrate spontaneous recovery per se. To address this we have added analyses of trial data and comparison of the final 3 trials of extinction to the subsequent three trials of each test. These analyses are included on page 5 and 6 of the manuscript and additional data figures can be found as panels 2E and 2F and 4 H and I, and pasted below.<br /> What we observe in the trial data for controls is an increase in responding from the end of extinction to the beginning of each test, thus demonstrating spontaneous recovery. Importantly, responding in the ChR2 group does not increase from the end of extinction to the beginning of the test, illustrating that LC stimulation during extinction prevents spontaneous recovery.

      Comparison of the final three trials of Extinction to the three trials of Test 1:

      Author response image 6.

      Comparison of the final three trials of Extinction to the three trials of Test 2:

      Author response image 7.

      Halorhodopsin Experiment Tests 1 and 2, respectively.

      Author response image 8.

      2a) Did the manipulations have any effect on the rates of lever-pressing outside of the stimulus?

      We did not detect any effect of the optogenetic manipulations on rates of lever pressing outside of the stimulus. This is demonstrated in the pre-CS intervals collected on stimulation days (i.e., extinction sessions) where we see similar response rates between controls and the ChR2 and Offset groups as shown below. There was no effect of group [F(2,40)=0.156, 0.856] or group x extinction day interaction [F(2,40)=0.146, p=0.865].

      Author response image 9.

      2b) Did the manipulations have any effect on rates of magazine entry either during or after the stimulus?

      For the ChR2 experiment, there were no group differences in magazine responses at the end of training [F(2,40)=2.442, p=0.100]. Responding decreased across days of extinction (when optogenetic stimulation was given) [F(2, 80)=38.070, p<0.001], but there was no effect of group [F(2,40)=0.801, p=0.456] and no interaction between day and group [F(4,40)=1.461, p=0.222]. Although a similar pattern is seen in the test data, group differences were not statistically different in the first [F(2,40)=2.352, p=0.108] or second [F(2,40)=1.900, p=0.166] tests, perhaps because magazine responses were quite low. Thus, overall, magazine data do not present a different picture than lever-pressing, but because of the lack of statistical effects during testing, we have chosen not to include these data in the manuscript.

      For the eNpHR experiment, again a similar pattern to lever-pressing was seen. There were no group differences at the end of acquisition [F(1,15)=0.290, p=0.598]. Responding decreased across days of extinction [F(2, 30)=4.775, p=0.016] but there was no main effect of group [F(1,15)=1.188, p=0.293], and no interaction between extinction and group [F(2,30)=0.070, p=0.932]. There were no group differences in the number of magazine entries in Test 1 [F(1,15)=1.378, p=0.259] or Test 2 [F(1,15)=0.319, p=0.580].

      Author response image 10.

      Author response image 11.

      2c) Did the manipulations affect the coupling of lever-press and magazine entry responses? I imagine that, after training, the lever-press and magazine entry responses are coupled: rats only visit the magazine after having made a lever-press response (or some number of leverpress responses). Stimulating the LC clearly had no acute effect on the performance of the lever-press response. If it also had no effect on the total number of magazine entries performed during the stimulus, it would be interesting to know whether the coupling of lever-presses and magazine entries had been disturbed in any way. One could assess this by looking at the jointdistribution of lever-presses (or runs of lever-presses) and magazine visits in each extinction session, or across the three sessions of extinction. As a proxy for this, one could look at the average latency to enter the magazine following a lever-press response (or run of leverpresses). Any differences here between the Controls and Group ChR2 would be informative with respect to the effects of the LC manipulations: that is, the results shown in Figure indicate that stimulating the LC has no acute effects on lever-pressing but protects against something like spontaneous recovery; whereas the results shown in Figure 4 indicate that inhibiting the LC facilitates the loss of responding across extinction without protecting against spontaneous recovery. The additional data/analyses suggested here would indicate whether LC stimulation had any acute effects on responding that might explain the protection from spontaneous recovery; and whether LC inhibition specifically reduced lever-pressing across extinction or whether it had equivalent effects on rates of magazine entry.

      Lever-press and magazine response data were collected trial by trial but not with the temporal resolution required for the analyses suggested by the reviewer. We do not have timestamps for magazine entries nor latency data. We can collect this type of data in future studies. At the session or trial level, magazine entries generally correspond to lever-pressing; being trained on ratio schedules, and from informal observation, rats will do several lever-presses and then check the magazine. Rates of each decrease across extinction (magazine data included in response to comment 2b. above). Optogenetic manipulation appeared to have no immediate effect on either response during extinction.

      ROCEDURAL

      1) Why were there three discriminative stimuli in acquisition: a light, white noise, and clicker?

      This was done to be consistent with and apply parameters similar to previous, related studies (Rescorla, 2006; Janak & Corbit, 2011) and to allow comparison to potential future studies that may involve stimulus compounds etc. (requiring training of multiple stimuli).

      2) Why were some rats extinguished to the noise while others were extinguished to the clicker? Were the effects of LC stimulation/inhibition dependent on the identity of the extinguished stimulus?

      Because the animals were trained with multiple stimuli, it allowed us some ability to choose amongst those stimuli to best balance response rates across groups before the key manipulations. The effects of LC manipulation did not differ between animals based on the identity of the extinguished stimulus.

      3) Did the acute effects of LC inhibition on extinction vary as a function of the stimulus identity?

      No

      4) Was the ITI in extinction the same as that in acquisition?

      Yes, the ITI was the same for acquisition and extinction sessions (variable, averaging to 90 seconds). We have added a sentence to the methods (p. 11) to reflect this.

      5) For Group Offset, when was the photo-stimulation applied in relation to the extinguished stimulus: was it immediately upon offset of the stimulus or at a later point in the ITI?

      The group label “Offset” was used to be consistent with Umaetsu et al. (2017) that delivered stimulation 50-70s after a trial. SImilarly, we mean it as discontinuous with the stimulus, not at the termination of the stimulus. We have revised the description of this group on page 11 to clarify the timing of the photostimulation as follows:

      “Animals in the Offset group (and relevant controls) underwent identical training with the exception that stimulation in extinction sessions occurred in the middle of the variable length ITI (45s after stimulus termination, on average).”

      MINOR

      1) "Such recovery phenomena undermine the success of extinction-based therapies..."

      ***Perhaps a different phrasing is needed here: "These phenomena show that extinction-based therapies are not always effective in suppressing an already-established response..."

      We have revised this sentence in line with the reviewer’s suggestion:

      “These phenomena mean that extinction-based therapies are not always successful in suppressing previously-established behaviours” (first paragraph of the introduction).

      2) Typo in para 1 of results: "F(2,19)=0.0.352"

      Thank you for finding this typo. It has been corrected. (p.4)

      3) "As another example of modular functional organization, no improvements to strategy setshifting following global LC stimulation, but improvements were observed when LC terminals in the medial prefrontal cortex were targeted (Cope et al., 2019)." ***This sentence is missing a "there were" before "no improvements".

      Thank you for finding this error. It has been corrected. (p.8)

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The Roco proteins are a family of GTPases characterized by the conserved presence of an ROC-COR tandem domain. How GTP binding alters the structure and activity of Roco proteins remains unclear. In this study, Galicia C et al. took advantage of conformation-specific nanobodies to trap CtRoco, a bacterial Roco, in an active monomeric state and determined its high-resolution structure by cryo-EM. This study, in combination with the previous inactive dimeric CtRoco, revealed the molecular basis of CtRoco activation through GTP-binding and dimer-to-monomer transition.

      Strengths:

      The reviewer is impressed by the authors' deep understanding of the CtRoco protein. Capturing Roco proteins in a GTP-bound state is a major breakthrough in the mechanistic understanding of the activation mechanism of Roco proteins and shows similarity with the activation mechanism of LRRK2, a key molecule in Parkinson's disease. Furthermore, the methodology the authors used in this manuscript - using conformation-specific nanobodies to trap the active conformation, which is otherwise flexible and resistant to single-particle average - is highly valuable and inspiring.

      Weakness:

      Though written with good clarity, the paper will benefit from some clarifications.

      1) The angular distribution of particles for the 3D reconstructions should be provided (Figure 1 - Sup. 1 & Sup. 2).

      The supplementary figures will be adapted to include particle distribution plots.

      2) The B-factors for protein and ligand of the model, Map sharpening factor, and molprobity score should be provided (Table 1).

      The map used to interpret the model was post-processed by density modification, therefore no sharpening factor was obtained. This information will be included in Table 1, together with B-factors and molprobity scores.

      3) A supplemental Figure to Figure 2B, illustrating how a0-helix interacts with COR-A&LRR before and after GTP binding in atomic details, will be helpful for the readers to understand the critical role of a0-helix during CtRoco activation.

      A supplemental figure will be prepared to illustrate this in the revised document.

      4) For the following statement, "On the other hand, only relatively small changes are observed in the orientation of the Roc a3 helix. This helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022), is located at the interface of the Roc and CORB domains and harbors the residues H554 and Y558, orthologous to the LRRK2 PD mutation sites N1337 and R1441, respectively."

      It is not surprising the a3-helix of the ROC domain only has small changes when the ROC domain is aligned (Figure 2E). However, in the study by Zhu et al (DOI: 10.1126/science.adi9926), it was shown that a3-helix has a "see-saw" motion when the COR-B domain is aligned. Is this motion conserved in CtRoco from inactive to active state?

      We indeed describe the conformational changes from the perspective of the Roc domain. When using the COR-B domain for structural alignment, a rotational movement of Roc (including a “seesaw”-like movement of the α3-helix helix around His554) with respect to COR-B is correspondingly observed. We will include this in the revised document.

      5) A supplemental figure showing the positions of and distances between NbRoco1 K91 and Roc K443, K583, and K611 would help the following statement. "Also multiple crosslinks between the Nbs and CtRoco, as well as between both nanobodies were found. ... NbRoco1-K69 also forms crosslinks with two lysines within the Roc domain (K583 and K611), and NbRoco1-K91 is crosslinked to K583".

      A provisional figure displaying these crosslinks is already provided below, and we will also consider including this in the revised manuscript. However, in interpreting these crosslinks it should be taken into consideration that the additive length of the DSSO spacer and the lysine side chains leads to a theoretical upper limit of ∼26 Å for the distance between the α carbon atoms of cross-linked lysines (and even a cut-off distance of 35 Å when taking into account protein dynamics).

      Author response image 1.

      6) It would be informative to show the position of CtRoco-L487 in the NF and GTP-bound state and comment on why this mutation favors GTP hydrolysis.

      We will create an additional figure showing the position of L487, and discuss possible mechanisms for the observed effect of a mutation on GTPase activity.

      Reviewer #2 (Public Review):

      Summary

      The manuscript by Galicia et al describes the structure of the bacterial GTPyS-bound CtRoco protein in the presence of nanobodies. The major relevance of this study is in the fact that the CtRoco protein is a homolog of the human LRRK2 protein with mutations that are associated with Parkinson's disease. The structure and activation mechanisms of these proteins are very complex and not well understood. Especially lacking is a structure of the protein in the GTP-bound state. Previously the authors have shown that two conformational nanobodies can be used to bring/stabilize the protein in a monomer-GTPyS-bound state. In this manuscript, the authors use these nanobodies to obtain the GTPyS-bound structure and importantly discuss their results in the context of the mammalian LRRK2 activation mechanism and mutations leading to Parkinson's disease. The work is well performed and clearly described. In general, the conclusions on the structure are reasonable and well-discussed in the context of the LRRK2 activation mechanism.

      Strengths:

      The strong points are the innovative use of nanobodies to stabilize the otherwise flexible protein and the new GTPyS-bound structure that helps enormously in understanding the activation cycle of these proteins.

      Weakness:

      The strong point of the use of nanobodies is also a potential weak point; these nanobodies may have induced some conformational changes in a part of the protein that will not be present in a GTPyS-bound protein in the absence of nanobodies.

      Two major points need further attention.

      1) Several parts of the protein are very flexible during the monomer-dimer activity cycle. This flexibility is crucial for protein function, but obviously hampers structure resolution. Forced experiments to reduce flexibility may allow better structure resolution, but at the same time may impede the activation cycle. Therefore, careful experiments and interpretation are very critical for this type of work. This especially relates to the influence of the nanobodies on the structure that may not occur during the "normal" monomer-dimer activation cycle in the absence of the nanobodies (see also point 2). So what is the evidence that the nanobody-bound GTPyS-bound state is biochemically a reliable representative of the "normal" GTP-bound state in the absence of nanobodies, and therefore the obtained structure can be confidentially used to interpret the activation mechanism as done in the manuscript.

      See below for an answer to remark 1 and 2.

      2) The obtained structure with two nanobodies reveals that the nanobodies NbRoco1 and NbRoco2 bind to parts of the protein by which a dimer is impossible, respectively to a0-helix of the linker between Roc-COR and LRR, and to the cavity of the LRR that in the dimer binds to the dimerizing domain CORB. It is likely the open monomer GTP-bound structure is recognized by the nanobodies in the camelid, suggesting that overall the open monomer structure is a true GTP-bound state. However, it is also likely that the binding energy of the nanobody is used to stabilize the monomer structure. It is not automatically obvious that in the details the obtained nonobody-Roco-GTPyS structure will be identical to the "normal" Roco-GTPyS structure. What is the influence of nanobody-binding on the conformation of the domains where they bind; the binding energy may be used to stabilize a conformation that is not present in the absence of the nanobody. For instance, NbRoco1 binds to the a0 helix of the linker; what is here the "normal" active state of the Roco protein, and is e.g. the angle between RocCOR and LRR also rotated by 135 degrees? Furthermore, nanobody NbRoco2 in the LRR domain is expected to stabilize the LRR domain; it may allow a position of the LRR domain relative to the rest of the protein that is not present without nanobody in the LRR domain. I am convinced that the observed open structure is a correct representation of the active state, but many important details have to be supported by e,g, their CX-MS experiments, and in the end probably need confirmation by more structures of other active Roco proteins or confirmation by a more dynamic sampling of the active states by e.g. molecular dynamics or NMR.

      Recently, nanobodies have increasingly been used successfully to obtain structural insights in protein conformational states (reviewed in Uchański et al, Curr. Opin. Struc. Biol. 2020). As reviewer # 2 points out, the concern is sometimes raised that antibodies could distort a protein into non-native conformations. Here, it is important to note that the nanobodies were raised by immunizing a llama with the fully native CtRoco protein bound to a non-hydrolysable GTP analogue, after which the nanobodies were selected by phage display using the same fully native and functional form of the protein. As clearly explained in Manglik et al. Annu Rev Pharmacol Toxicol. 2017, the probability of an in vivo matured nanobody inducing a non-native conformation of the antigen is low, although it is possible that it selects a high-energy, low-population conformation of a dynamic protein. Immature B cells require engagement of displayed antibodies with antigen to proliferate and differentiate during clonal selection. Antibodies that induce non-native conformations of the antigen pay a substantial energetic penalty in this process, and B cell clones displaying such antibodies will have a significantly lower probability of proliferation and differentiation into mature antibody-secreting B lymphocytes. Hence, many recent experiments and observation give credence to the notion that nanobodies bind antigens primarily by conformational selection and not induced fit (e.g. Smirnova et al. PNAS 2015).

      Extrapolated to the case of CtRoco, which is clearly very flexible in its GTP-bound form, this means that the nanobodies are able to trap and stabilize one conformational state that is representative of the “active state” ensemble of the protein. In this respect, it is clear from our experiments (XL-MS, affinity and effect on GTPase activity) that the effects of NbRoco1 and NbRoco2 are additive (or even cooperative), meaning that both nanobodies recognize different features of the same CtRoco “active state”. Correspondingly, the monomeric, elongated “open” conformation is also observed in the structure of CtRoco bound to NbRoco1 only (Figure1 - supplement 2), albeit that this structure still displays more flexibility. The monomerization and conformational changes that we observe and describe in the current paper at high resolution are also in very good agreement with earlier observations for CtRoco in the GTP-bound form in absence of any nanobodies, including negative stain EM (Deyaert et al. Nature Commun, 2017), hydrogen-deuterium exchange experiments (Deyaert et al. Biochem. J. 2019) and native MS (Leemans et al. Biochem J. 2020).

      In the revised document we will include some additional text to address and clarify these aspects.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The present study provides a phylogenetic analysis of the size prefrontal areas in primates, aiming to investigate whether relative size of the rostral prefrontal cortex (frontal pole) and dorsolateral prefrontal cortex volume vary according to known ecological or social variables.

      I am very much in favor of the general approach taken in this study. Neuroimaging now allows us to obtain more detailed anatomical data in a much larger range of species than ever before and this study shows the questions that can be asked using these types of data. In general, the study is conducted with care, focusing on anatomical precision in definition of the cortical areas and using appropriate statistical techniques, such as PGLS. That said, there are some points where I feel the authors could have taken their care a bit further and, as a result, inform the community even more about what is in their data.

      We thank the reviewer for this globally positive evaluation of our work, and we appreciate the advices to improve our manuscript.

      The introduction sets up the contrast of 'ecological' (mostly foraging) and social variables of a primate's life that can be reflected in the relative size of brain regions. This debate is for a large part a relic of the literature and the authors themselves state in a number of places that perhaps the contrast is a bit artificial. I feel that they could go further in this. Social behavior could easily be a solution to foraging problems, making them variables that are not in competition, but simply different levels of explanation. This point has been made in some of the recent work by Robin Dunbar and Susanne Shultz.

      Thank you for this constructive comment, and we acknowledge that the contrast between social vs ecological brain is relatively marginal here. Based also on the first remark by reviewer 3, we have reformulated the introduction to emphasize what we think is actually more critical: the link between cognitive functions as defined in laboratory conditions and socio-ecological variables measured in natural conditions. And the fact that here, we use brain measures as a potential tool to relate these laboratory vs natural variables through a common scenario. Also, we were already mentioning the potential interaction between social and foraging processes in the discussion, but we are happy to add a reference to recent studies by S. Shultz and R. Dunbar (2022), which is indeed directly relevant. We thank the reviewer for pointing out this literature.

      In a similar vein, the hypotheses of relating frontal pole to 'meta-cognition' and dorsolateral PFC to 'working memory' is a dramatic oversimplification of the complexity of cognitive function and does a disservice to the careful approach of the rest of the manuscript.

      We agree that the formulation of which functions we were attributing to the distinct brain regions might not have been clear enough, but the functional relation between frontal pole and metacognition in the one hand, and DLPFC and working memory on the other hand, have been firmly established in the literature, both through laboratory studies and through clinical data. Clearly, no single brain region is necessary and sufficient for any cognitive operation, but decades of neuropsychology have demonstrated the differential implication of distinct brain regions in distinct functions, which is all we mean here. We have made a specific point on that topic in the discussion (cf p. 16). We have also reformulated the introduction to clarify that, even if the relation between these regions and their functions (FP/ metacognition; DLPFC/ working memory) was clear in laboratory conditions, it was not clear whether this mapping could be used for real life conditions. And therefore whether that simplification was somehow justified beyond the lab (and the clinics), and whether these neuro-cognitive concepts could be applied to natural conditions, are indeed critical questions that we wanted to address. The central goal of the present study was precisely to evaluate the extent to which this brain/cognition relation could be used to understand more natural behaviors and functions, and we hope that it appears more clearly now.

      One can also question the predicted relationship between frontal pole meta-cognition and social abilities versus foraging, as Passingham and Wise show in their 2012 book that it is frontal pole size that correlates with learning ability-an argument that they used to relate this part of the brain to foraging abilities. I would strongly suggest the authors refrain from using such descriptive terms. Why not simply use the names of the variables actually showing significant correlations with relative size of the areas?

      We basically agree with the reviewer, and we acknowledge the lack of clarity in the introduction of the previous manuscript. There were indeed lots of ambiguity in what we were referring to as ‘function’, associated with a given brain region. « Function » referred to way to many things! We have reformulated the introduction not only to clarify the different types of functions that were attributed to distinct brain regions in the literature but also to clarify how this study was addressing the question: by trying to articulate concepts from neuroscience laboratory studies with concepts from behavioral ecology and evolution using intuitive scenarios. We hope that the present version of the introduction makes that point clearer.

      The major methodological judgements in this paper are of course in the delineation of the frontal pole and dorsolateral prefrontal cortex. As I said above, I appreciate how carefully the authors describe their anatomical procedure, allowing researchers to replicate and extend their work. They are also careful not to relate their regions of interest to precise cytoarchitectonic areas, as such a claim would be impossible to make without more evidence. That said, there is a judgement call made in using the principal sulcus as a boundary defining landmark for FP in monkeys and the superior frontal sulcus in apes. I do not believe that these sulci are homologous. Indeed, the authors themselves go on to argue that dorsolateral prefrontal cortex, where studied using cytoarchitecture, stretches to the fundus of principal sulcus in monkeys, but all the way to the inferior frontal sulcus in apes. That means that using the fundus of PS is not a good landmark.

      We thank the reviewer for his kind remarks on our careful descriptions. But then, it is not clear whether our choice of using the principal sulcus as a boundary for FP in monkeys vs the superior frontal sulcus in apes is actually a judgement call. First, and foremost, there is no clear and unambiguous definition of what should be the boundaries of the FP. By contrast with cytoarchitectonic maps, but clearly this is out of reach here. In humans and great apes we used Bludau et al 2014 (i.e. sup frontal sulcus), and in monkeys, we chose a conservative landmark that eliminated area 9, which is traditionally associated with the DLPFC (Petrides, 2005; Petrides et al, 2012; Semendeferi et al, 2001).

      Of course, any definition will attract criticism, so the best solution might be to run the analysis multiple times, using different definitions for the areas, and see how this affects results.

      Indeed, functional maps indicate that dorsal part of anterior PFC in monkeys is functionally part of FP. But again, cytoarchitectonic maps also indicate that this part of the brain includes BA 9, which is traditionally associated with DLPFC (Petrides, 2005; Petrides et al, 2012). As already pointed out in the discussion, there is a functional continuum between FP and DLPFC and our goal when using PS as dorsal border was to be very conservative and to exclude the ambiguous area. But we agree with the reviewer that given that this decision is arbitrary, it was worth exploring other definitions of the FP volume. So, we did complete a new analysis with a less conservative definition of the FP, to include this ambiguous dorsal area, and it is now included in the supplementary material. Maybe as expected, including the ambiguous area in the FP volume shifted the relation with socio-ecological variables towards the pattern displayed by the DLPFC (ie the influence of population density decreased). The most parsimonious interpretation of this results is that when extending the border of the FP region to cover a part of the brain which might belong to the DLPFC, or which might be somehow functionally intermediate between the 2, the specific relation of the FP with socio-ecological variables decreases. Thus, even if we agree that it was important to conduct this analysis, we believe that it only confirms the difficulty to identify a clear boundary between FP and DLPFC. Again, we have clearly explained throughout the manuscript that we admit the lack of precision in our definitions of the functional brain regions. In that frame, the conservative option seems more appropriate and for the sake of clarity, the results of the additional analysis of a FP volume that includes the ambiguous area is only included in the supplementary material.

      If I understand correctly, the PGLS was run separately for the three brain measure (whole brain, FP, DLPFC). However, given that the measures are so highly correlated, is there an argument for an analysis that allows testing on residuals. In other words, to test effects of relative size of FP and DLPFC over and above brain size?

      Generally, using residuals as “data” (or pseudo-data) is not recommended in statistical analyses. Two widely cited references from the ecological literature are:

      Garcia-Berthou E. (2001) On the Misuse of Residuals in Ecology: Testing Regression Residuals vs. the Analysis of Covariance. Journal of Animal Ecology, 70 (4): 708-711.

      Freckleton RP. (2002). On the misuse of residuals in ecology: regression of residuals vs. multiple regression. Journal of Animal Ecology 71: 542–545. https://doi.org/10.1046/ j.1365-2656.2002.00618.x.

      The main reason for this recommendation is that residuals are dependent on the fitted model, and thus on the particular sample under consideration and the eventual significant effects that can be inferred.

      In the discussion and introduction, the authors discuss how size of the area is a proxy for number of neurons. However, as shown by Herculano-Houzel, this assumption does not hold across species. Across monkeys and apes, for instance, there is a different in how many neurons can be packed per volume of brain. There is even earlier work from Semendeferi showing how frontal pole especially shows distinct neuron-to-volume ratios.

      We appreciate the reviewer’s comment, but the references to Herculano-Houzel that we have in mind do indicate that the assumption is legitimate within primates.

      Herculano-Houzel et al (2007) show that the neuronal density of the cortex is well conserved across primate species (but only monkeys were studied). The conclusion of that study is that using volumes as a proxy for number of neurons, as a measure of computational capacity, should be avoided between rodents and primates (and as they showed later, even more so with birds, for which neuronal density is higher). BUT within primates, since neuronal densities are conserved, volume is a good predictor of number of neurons. Gabi et al (2016) provide evidence that the neuronal density of the PFC is well conserved between humans and non-human primates, which implies that including humans and great apes in the comparison is legitimate. In addition, the brain regions included in the analysis presumably include very similar architectonic regions (e.g. BA 10 for FP, BA 9/46 for DLPFC), which also suggests that the neuronal density should be relatively well conserved across species. Altogether, we believe that there is sufficient evidence to support the idea that the volume of a PFC region in primates is a good proxy for the number of neurons in that region, and therefore of its computational capacity.

      Semendeferi and colleagues (2001) pointed out some differences in cytoarchitectonic properties across parts of the FP and discussed how these properties could 1) be used to identify area 10 across species 2) be associated with distinct computational properties, with the idea that thicker ‘cell body free’ layers would leave more space for establishing connections (across dendrites and axons). This pioneering work, together with more recent imaging studies on functional connectivity (e.g. Sallet et al, 2013) emphasize the critical contribution of connectivity pattern as a tool for comparative anatomy. But unfortunately, as pointed out in the discussion already, this is currently out of reach for us.

      We acknowledge the limitations, and to be fair, the notion of computational capacity itself is hard to define operationally. Based on the work of Herculano-Houzel et al, average density is conserved enough across primates (including humans) to justify our approximation. We have tried to define our regions of interest using both anatomical and functional maps and, thanks to the reviewer’s suggestions, we even tried several ways to segment these regions. Functional maps in macaques and humans do not exactly match cytoarchitectonic maps, presumably because functions rely not only upon the cytoarchitectonics but also on connectivity patterns (e.g. Sallet et al, 2013).

      In sum, we appreciate the reviewer’s point but feel that, given the current understanding of brain functions and the relative conservation of neuronal density across primate PFC regions, the volume of a PFC region seems to be reasonable proxy for its number of neurons, and therefore its computational capacity. We have added these points to the discussions, and we hope that the reader will be able to get a fair sense of how legitimate is that position, given the literature.

      Overall, I think this is a very valuable approach and the study demonstrates what can now be achieved in evolutionary neuroscience. I do believe that they authors can be even more thorough and precise in their measurements and claims.

      Reviewer #2 (Public Review):

      In the manuscript entitled "Linking the evolution of two prefrontal brain regions to social and foraging challenges in primates" the authors measure the volume of the frontal pole (FP, related to metacognition) and the dorsolateral prefrontal cortex (DLPFC, related to working memory) in 16 primate species to evaluate the influence of socio-ecological factors on the size of these cortical regions. The authors select 11 socio-ecological variables and use a phylogenetic generalized least squares (PGLS) approach to evaluate the joint influence of these socio-ecological variables on the neuro-anatomical variability of FP and DLPFC across the 16 selected primate species; in this way, the authors take into account the phylogenetic relations across primate species in their attempt to discover the influence of socio-ecological variables on FP and DLPF evolution.

      The authors run their studies on brains collected from 1920 to 1970 and preserved in formalin solution. Also, they obtained data from the Mussée National d´Histoire Naturelle in Paris and from the Allen Brain Institute in California. The main findings consist in showing that the volume of the FP, the DLPFC, and the Rest of the Brain (ROB) across the 16 selected primate species is related to three socio-ecological variables: body mass, daily traveled distance, and population density. The authors conclude that metacognition and working memory are critical for foraging in primates and that FP volume is more sensitive to social constraints than DLPFC volume.

      The topic addressed in the present manuscript is relevant for understanding human brain evolution from the point of view of primate research, which, unfortunately, is a shrinking field in neuroscience.

      We must not have been clear enough in our manuscript, because our goal is precisely not to separate humans from other primates. This is why, in contrast to other studies, we have included human and non-human primates in the same models. If our goal had been to study human evolution, we would have included fossil data (endocasts) from the human lineage.

      But the experimental design has two major weak points: the absence of lissencephalic primates among the selected species and the delimitation of FP and DLPFC. Also, a general theoretical and experimental frame linking evolution (phylogeny) and development (ontogeny) is lacking.

      We admit that lissencephalic species could not be included in this study because we use sulci as key landmarks. We believe that including lissencephalic primates would have introduced a bias and noise in our comparisons, as the delimitations and landmarks would have been different for gyrencephalic and lissencephalic primates. Concerning development, it is simply beyond the scope of our study.

      Major comments.

      1) Is the brain modular? Is there modularity in brain evolution?: The entire manuscript is organized around the idea that the brain is a mosaic of units that have separate evolutionary trajectories:

      "In terms of evolution, the functional heterogeneity of distinct brain regions is captured by the notion of 'mosaic brain', where distinct brain regions could show a specific relation with various socio-ecological challenges, and therefore have relatively separate evolutionary trajectories".

      This hypothesis is problematic for several reasons. One of them is that each evolutionary module of the brain mosaic should originate in embryological development from a defined progenitor (or progenitors) domain [see García-Calero and Puelles (2020)]. Also, each evolutionary module should comprise connections with other modules; in the present case, FP and DLPFC have not evolved alone but in concert with, at least, their corresponding thalamic nuclei and striatal sector. Did those nuclei and sectors also expand across the selected primate species? Can the authors relate FP and DLPFC expansion to a shared progenitor domain across the analyzed species? This would be key to proposing homology hypotheses for FP and DLPFC across the selected species. The authors use all the time the comparative approach but never explicitly their criteria for defining homology of the cerebral cortex sectors analyzed.

      We do not understand what the referee is referring to with the word ‘module’, and why it relates to development. Same thing for the anatomical relation with subcortical structures. Yes, the identity of distinct functional cortical regions relies upon subcortical inputs during development, but clearly this is neither technically feasible, nor relevant here anyways.

      We acknowledge, however, that our definition of functional regions was not precise enough, and we have updated the introduction to clarify that point. In short, we clearly do not want to make a strong case for the functional borders that we chose for the regions of interest here (FP and DLPFC), but rather use those regions as proxies for their corresponding functions as defined in laboratory conditions for a couple of species (rhesus macaques and humans, essentially).

      Contemporary developmental biology has showed that the selection of morphological brain features happens within severe developmental constrains. Thus, the authors need a hypothesis linking the evolutionary expansion of FP and DLPFC during development. Otherwise, the claims form the mosaic brain and modularity lack fundamental support.

      Once again, we do not think that our definition of modules matches what the reviewer has in mind, i.e. modules defined by populations of neurons that developed together (e.g. visual thalamic neurons innervating visual cortices, themselves innervating visual thalamic neurons). Rather, the notion of mosaic brain refers to the fact that different parts of the brain are susceptible to distinct (but not necessarily exclusive) sources of selective pressures. The extent to which these ‘developmental’ modules are related to ‘evolutionary’ modules is clearly beyond the scope of this paper.

      Our goal here was to evaluate the extent to which modules that were defined based on cognitive operations identified in laboratory conditions could be related (across species) to socio-ecological factors as measured in wild animals. Again, we agree that the way these modules/ functional maps were defined in the paper were confusing, and we hope that the new version of the manuscript makes this point clearer.

      Also, the authors refer most of the time to brain regions, which is confusing because they are analyzing cerebral cortex regions.

      We do not understand why the term ‘brain’ is more confusing than ‘cerebral cortex’, especially for a wide audience.

      2) Definition and delimitation of FP and DLPFC: The precedent questions are also related to the definition and parcellation of FP and DLPFC. How homologous cortical sectors are defined across primate species? And then, how are those sectors parcellated?

      The authors delimited the FP:

      "...according to different criteria: it should match the functional anatomy for known species (macaques and humans, essentially) and be reliable enough to be applied to other species using macroscopic neuroanatomical landmarks".

      There is an implicit homology criterion here: two cortical regions in two primate species are homologs if these regions have similar functional anatomy based on cortico-cortical connections. Also, macroscopic neuroanatomical landmarks serve to limit the homologs across species.

      This is highly problematic. First, because similar function means analogy and not necessarily homology [for further explanation see Puelles et al. (2019); García-Cabezas et al. (2022)].

      We are not sure to follow the Reviewer’s point here. First, it is not clear what would be the evolutionary scenario implied by this comment (evolutionary divergence followed by reversion leading to convergence?). Second, based on the literature, both the DLPFC and the FP display strong similarities between macaques and humans, in terms of connectivity patterns (Sallet et al, 2013), in terms of lesion-induced deficit and in terms of task-related activity (Mansouri et al, 2017). These criteria are usually sufficient to call 2 regions functionally equivalent. We do not see how this explanation is "highly problematic" as it is clearly the most parsimonious based on our current knowledge.

      Second, because there are several lissencephalic primate species; in these primates, like marmosets and squirrel monkeys, the whole approach of the authors could not have been implemented. Should we suppose that lissencephalic primates lack FP or DLPFC?

      We understand neither the reviewer’s logic, nor the tone. We understand that the reviewer is concerned by the debate on whether some laboratory species are more relevant than others for studying the human prefrontal cortex, but this is clearly not the objective of our work. As explained in the manuscript, we identified FP and DLPFC based on functional maps in humans and laboratory monkeys (macaques), and we used specific gyri as landmarks that could be reliably used in other species. And, as rightfully pointed out by reviewer 1, this is in and off itself not so trivial. Of course, lissencephalic animals could not be studied because we could not find these landmarks, but why would it mean that they do not have a prefrontal cortex? The reviewer implies that species that we did not study do not have a prefrontal cortex, which makes little sense. Standards in the field of comparative anatomy of the PFC, especially when it implies rodents (lissencephalic also) include cytoarchitectonic and connectivity criteria, but obviously we are not in a position to address it here. We have, however, included references to the seminal work of Angela Roberts and collaborator in the discussion on marmosets prefrontal functions, to reinforce the idea that the functional organization is relatively well conserved across all primates (with or without gyri on their brain) (Dias et al, 1996; Roberts et al, 2007).

      Do these primates have significantly more simplistic ways of life than gyrencephalic primates? Marmosets and squirrel monkeys have quite small brains; does it imply that they have not experience the influence of socio-ecological factors on the size of FP, DLPFC, and the rest of the brain?

      Again, none of this is relevant here, because we could not draw conclusions on species that we cannot study for methodological reasons. The reviewer seems to believe that an absence of evidence is equivalent to an evidence of absence, but we do not.

      The authors state that:

      "the strong development of executive functions in species with larger prefrontal cortices is related to an absolute increase in number of neurons, rather than in an increase in the ration between the number of neurons in the PFC vs the rest of the brain".

      How does it apply to marmosets and squirrel monkeys?

      Again, we do not understand the reviewer’s point, since it is widely admitted that lissencephalic monkeys display both a prefrontal cortex and executive functions (again, see the work of Angela Roberts cited above). Our goal here was certainly not to get into the debate of what is the prefrontal cortex in a handful of laboratory species, but to evaluate the relevance of laboratory based neuro-cognitive concepts for understanding primates in general, and in their natural environment.

      References:

      García-Cabezas MA, Hacker JL, Zikopoulos B (2022) Homology of neocortical areas in rats and primates based on cortical type analysis: an update of the Hypothesis on the Dual Origin of the Neocortex. Brain structure & function Online ahead of print. doi:doi.org/ 10.1007/s00429-022-02548-0

      García-Calero E, Puelles L (2020) Histogenetic radial models as aids to understanding complex brain structures: The amygdalar radial model as a recent example. Front Neuroanat 14:590011. doi:10.3389/fnana.2020.590011

      Nieuwenhuys R, Puelles L (2016) Towards a New Neuromorphology. doi:10.1007/978-3-319-25693-1

      Puelles L, Alonso A, Garcia-Calero E, Martinez-de-la-Torre M (2019) Concentric ring topology of mammalian cortical sectors and relevance for patterning studies. J Comp Neurol 527 (10):1731-1752. doi:10.1002/cne.24650

      Reviewer #3 (Public Review):

      This is an interesting manuscript that addresses a longstanding debate in evolutionary biology - whether social or ecological factors are primarily responsible for the evolution of the large human brain. To address this, the authors examine the relationship between the size of two prefrontal regions involved in metacognition and working memory (DLPFC and FP) and socioecological variables across 16 primate species. I recommend major revisions to this manuscript due to: 1) a lack of clarity surrounding model construction; and 2) an inappropriate treatment of the relative importance of different predictors (due to a lack of scaling/normalization of predictor variables prior to analysis). My comments are organized by section below:

      We thank the reviewer for the globally positive evaluation and for the constructive remarks. Introduction:

      • Well written and thorough, but the questions presented could use restructuring.

      Again, we thank the reviewer, and we believe that this is coherent with some of the remarks of reviewer 1. We have extensively revised the introduction, toning down the social vs ecological brain issue to focus more on what is the objective of the work (evaluating the relevance of lab based neuro-cognitive concepts for understanding natural behavior in primates).

      Methods:

      • It is unclear which combinations of models were compared or why only population density and distance travelled tested appear to have been included.

      The details of the model comparison analysis were presented as a table in the supplementary material (#3, details of the model comparison data), but we understand that this was not clear enough. We have provided more explanation both in the main manuscript and in the supplements. All variables were considered a priori; however, we proceeded beforehand to an exploratory analyses which led us to exclude some variables because of their lack of resolution (not enough categories for qualitative variables) or strong cross-correlations with other quantitative variables. There were much more than three variables included in the models but the combination of these 3 (body mass, daily traveled distance and population density) best predicted (had the smallest AIC) the size of the brain regions. We provide additional information about these exploratory analyses in the supplementary material, sections 2 and 3.

      • Brain size (vs. body size) should be used as a predictor in the models.

      We do not understand the theoretical reason for replacing body size by brain size in the models. Brain size is not a socio-ecological variable. And of course, that would be impossible for modeling brain size itself. Or is it that the reviewer suggests to use brain size as a covariate to evaluate the effects of other variables in the model over and above the effect on brain size? But what is the theoretical basis for this?

      • It is not appropriate to compare the impact of different predictors using their coefficients if the variables were not scaled prior to analysis.

      We thank the Reviewer for this comment; however, standardized coefficients are not unproblematic because their calculations are based on the estimated standard-deviations of the variables which are likely to be affected by sampling (in effect more than the means). We note that the methods of standardized coefficients have attracted several criticisms in the literature (see the References section in https://en.wikipedia.org/wiki/Standardized_coefficient). Nevertheless, we now provide a table with these coefficients which makes an easy comparison for the present study. We also updated tables 1, 2 and 3 to include standardized beta values.

      Reviewer #1 (Recommendations For The Authors):

      N/A

      Reviewer #2 (Recommendations For The Authors):

      Contemporary developmental biology has showed that the brain of all mammals, including primates, develops out of a bauplan (or blueprint) made of several fundamental morphological units that have invariant topological relations across species (Nieuwenhuys and Puelles 2016).

      At some point in the discussion the authors acknowledge that:

      "Our aim here was clearly not to provide a clear identification of anatomical boundaries across brain regions in individual species, as others have done using much finer neuroanatomical methods. Such a fine neuroanatomical characterization appears impossible to carry on for a sample size of species compatible with PGLS".

      I do not think it would be impossible to carry such neuroanatomical characterization. It would take time and effort, but it is feasible. Such characterization, if performed within the framework of contemporary developmental biology, would allow for well-founded definition and delineation of cortical sectors across primate species, including lissencephalic ones, and would allow for meaningful homologies and interspecies comparisons.

      We do not see how our work would benefit from developmental biology at that point, because it is concerned with evolution, and these are very distinct biological phenomena. We do not understand the reviewer’s focus on lissencephalic species, because they are not so prevalent across primates, and it is unlikely that adding a couple of lissencephalic species will change much to the conclusions.

      Minor points:

      • Please, format references according to the instructions of the journal.

      Ok - done

      • The authors could use the same color code across Figures 1, 2, and 3.

      Ok – done

      • The authors say that group hunting "only occurs in a few primate species", but it also occurs in wolves, whales, and other mammalian species.

      We focus on primates here, these other species are irrelevant. Again, this is beside the point.

      Reviewer #3 (Recommendations For The Authors):

      My comments are organized by section below:

      Introduction:

      • Well written and thorough

      • The two questions presented towards the end of the intro are not clear and do not guide the structure of the methods/results sections. I believe one it would be more appropriate to ask if: 1) the relative proportions of the FP and DLPFC (relative to ROB) are consistent across primates; and 2) if the relative size of these region is best predicted by social and/ or ecological variables. Then, the results sections could be organized according to these questions (current results section 1 = 1; current results sections 2, 3, 4 = 2.1, 2.2, 2.3)

      As explained above, we agree with the reviewer that the introduction was somehow misleading and we have edited it extensively. We do not, however, agree with the reviewer regarding the relative (vs absolute) measure. We have discussed this in our response to reviewer 1 regarding the comparison of regional volumes as proxies for number of neurons. The best predictor of the computing capacity of a brain region is its number of neurons, but there is no reason to believe that this capacity should decrease if the rest of the brain increases, as implied by the relative measure that the reviewer proposes. That debate is probably critical in the field of comparative neuroanatomy, and confronting different perspectives would surely be both interesting and insightful, but we feel that it is beyond the scope of the present article.

      Methods:

      • While the methods are straightforward and generally well described, it is unclear which combinations of models were compared or why only population density and distance travelled tested appear to have been included (in e.g., Fig SI 3.1) even though many more variables were collected.

      We agree that this was not clear enough, and we have tried to improve the description of our model comparison approach, both in the main text and in the supplementary material.

      • Why was body mass rather than ROB used as a predictor in the models? The authors should instead/also include analyses using ROB (so the analysis is of FP and DLPFC size relative to brain size). Using body mass confounds the analyses since they will be impacted by differences in brain size relative body size.


      Again, we have addressed this issue above. First, body size is a socio-ecological variable (if anything, it especially predicts energetic needs and energy expenditure), but ROB is clearly not. We do not see the theoretical relevance of ROB in a socio-ecological model. Second, from a neurobiological point of view, since within primates the volume of a given brain region is directly related to its number of neurons (again, see work of Herculano-Houzel), which is a good proxy for its computing capacity, we do not see the theoretical reason for considering ROB.

      • It is not appropriate to compare the impact of different predictors using their coefficients if the variables were not scaled prior to analysis. The authors need to implement this in their approach to make such claims.

      We thank the reviewer again for pointing that out. We have addressed this question above.

      • Differences across primates in terms of frontal lobe networks throughout the brain should be acknowledged (e.g., Barrett et al. 2020, J Neurosci).

      We have added that reference to the discussion, together with other references showing that the difference between human and non-human primates is significant, but essentially quantitative, rather than qualitative (the building blocks are relatively well conserved, but their relative weight differs a lot). Thank you for pointing it out.

      I hope the authors find my comments helpful in revising their manuscript.

      And we thank again the reviewer for the helpful and constructive comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study identifies the homeodomain transcription factor and suspected autism-candidate gene Meis2 as transcriptional regulators of maturation and end-organ innervation of low-threshold mechanoreceptors (LTMRs) in the dorsal root ganglia (DRG) of mice. For a few years, the view on autism spectrum disorders (ASD) has shifted from a disorder that exclusively affects the brain to a condition that also includes the peripheral somatosensory system, even though our knowledge about the genes involved is incomplete. The study by Desiderio and colleagues is therefore not only scientifically interesting but may also have clinical relevance. The work is convincing, with appropriate and validated methodology in line with current state-of-the-art and the findings contribute both to understanding and potential application.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work examined transcription factor Meis2 in the development of mouse and chick DRG neurons, using a combination of techniques, such as the generation of a new conditional mutant strain of Meis2, behavioral assays, in situ hybridization, transcriptomic study, immunohistochemistry, and electrophysiological (ex vivo skin-nerve preparation) recordings. The authors found that Meis2 was selectively expressed in A fiber LTMRs and that its disruption affects the A-LTMRs' end-organ innervation, transcriptome, electrophysiological properties, and light touch-sensation.

      Strengths:

      1) The authors utilized a well-designed mouse genetics strategy to generate a mouse model where the Meis2 is selectively ablated from pre- and post-mitotic mouse DRG neurons. They used a combination of readouts, such as in situ hybridization, immunhistochemistry, transcriptomic analysis, skin-nerve preparation, electrophysiological recordings, and behavioral assays to determine the role of Meis2 in mouse DRG afferents.

      2) They observed a similar preferential expression of Meis2 in large-diameter DRG neurons during development in chicken, suggesting evolutionarily conserved functions of this transcription factor.

      3) Conducted severe behavioral assays to probe the reduction of light-touch sensitivity in mouse glabrous and hairy skin. Their behavioral findings support the idea that the function of Meis2 is essential for the development and/or maturation of LTMRs.

      4) RNAseq data provide potential molecular pathways through which Meis2 regulates embryonic target-field innervation.

      5) Well-performed electrophysiological study using skin-nerve preparation and recordings from saphenous and tibial nerves to investigate physiological deficits of Meis2 mutant sensory afferents.

      6) Nice whole-mount IHC of the hair skin, convincingly showing morphological deficits of Meis2 mutant SA- and RA- LTMRs.

      Overall, this manuscript is well-written. The experimental design and data quality are good, and the conclusion from the experimental results is logical.

      Weaknesses:

      1) Although the authors justify this study for the involvement of Meis2 in Autism and Autism associated disorders, no experiments really investigated Autism-like specific behavior in the Meis2 ablated mice.

      Indeed, in the first version of the manuscript, we use current understanding of ASD in mouse models and associated sensory defects to articulate our introduction and discussion. As noticed by reviewer 1, none of our experiments really investigated ASD. To avoid over-interpretation of the data, we have now removed sentences mentioning ASD and related references throughout the manuscript.

      2) For mechanical force sensing-related behavioral assays, the authors performed VFH and dynamic cotton swabs for the glabrous skin, and sticky tape on the back (hairy skin) for the hairy skin. A few additional experiments involving glabrous skin plantar surfaces, such as stick tape or flow texture discrimination, would make the conclusion stronger.

      We fully agree on that performing more behavioral analysis investigating with more details the primary sensory defects as well as some ASD-related behavior would re-inforce our conclusions. Our behavioral analysis clearly showed a loss of sensitivity in response to mechanical stimuli within the light touch range but not for higher range mechanical or noxious thermal stimuli. While the experiments suggested by the reviewer are interesting and would strengthen our conclusions, they are far from trivial and require large cohorts. Given the current laboratory conditions as stated at the outset, these unfortunately are not within reach.

      3) The authors considered von Frey filaments (1 and 1.4 g) as noxious mechanical stimuli (Figure 1E and statement on lines 181-183), which is questionable. Alligator clips or pinpricks are more certain to activate mechanical nociceptors.

      To avoid misinterpretation of the higher Von Frey filament tests, we deleted the two following statement in page 7: “In the von Frey test, the thresholds for paw withdrawal were similar between all genotypes when using filaments exerting forces ranging from 1 to 1.4g, which likely reflects the activation of mechanical nociception suggesting that Meis2 gene inactivation did not affect nociceptor function.”. The sentence “… while sparing other somatosensory behaviors” was also deleted.

      4) There are disconnections and inconsistencies among findings from morphological characterization, physiological recordings, and behavior assays. For example, Meis2 mutant SA-LTMRs show a deficiency in Merkel cell innervation in the glabrous skin but not in hairy skin. With no clear justification, the authors pooled recordings of SA-LTMRs from both glabrous and hairy skin and found a significant increase in mean vibration threshold. Will the results be significantly different if the data are analyzed separately? In addition, whole-mount IHC of Meissner's corpuscles showed morphological changes, but electrophysiological recordings didn't find significant alternation of RAI LTMRs. What does the morphological change mean then? Since the authors found that Meis2 mice are less sensitive to a dynamic cotton swab, which is usually considered as an RA-LTMR mediated behavior, is the SAI-LTMR deficit here responsible for this behavior? Connections among results from different methods are not clear, and the inconsistency should be discussed.

      We thank Reviewer 1 for the careful review of our data and fully agree with the weaknesses identified, weaknesses we were ourselves aware of at the time of submission. In particular on the lack of stronger connections between histological and electrophysiological data. Electrophysiological studies were conducted on a first cohort of mice where we mostly emphasize on WT and Meis2 mutant mice. The goal was to describe differences in electrophysiological properties of identified mechanoreceptors from these two genotypes. While substantial differences between WT and Islet1-Cre mice were not expected, only very few mice with this genotype were examined at that time to confirm this assumption. We fully agree with reviewer 1 that confirming differences in SA-LTMRs responses in the hairy and glabrous at electrophysiological levels would be interesting and worthwhile. It is assumed that the physiological properties of SA-LTMRs from glabrous and hairy skins are equivalent in both skin types. Indeed direct comparisons have been made between glabrous and hairy skin SA-LTMRs revealing that they have equivalent receptor properties (see Walcher et al J Physiol quoted in the manuscript). We had not recorded from a sufficient number of hairy and glabrous skin SA-LTMRs to make any meaningful comparison statistically. When we noticed the dramatic differences in the innervation patterns of Merkel cell complexes between glabrous and hairy skin, we immediately planned a second mice cohort, but as explained in the onset to the Public Review, this cohort was sacrificed due to the pandemic lockdown. However, the obtained dataset clearly shows that in Meis2 mutant mice many SA-LTMRs had similar vibration thresholds to those of wild types.

      For Meissner corpuscle, histological analysis evidenced clear morphological differences that could of course be investigated at the level of the dual innervation previously reported by Neubarth et al. It is uncertain whether differences in their electrophysiological responses would be revealed by increasing the number of recorded fibers. For this reason, we clearly stated this limitation in the results section page 7 “There was a tendency for RA-LTMRs in Isl1Cre/+::Meis2LoxP/LoxP mutant mice to fire fewer action potentials to sinusoids and to the ramp phase of a series 2 second duration ramp and hold stimuli, but these differences were not statistically significant (Figure 5B). Nevertheless it is important to point out that an electrical search strategy revealed that many Aβ-fibers did not have mechanosensitive receptive fields. Thus by focusing on LTMRs with a mechanosensitive receptive field, we ignore the fact that fewer fibers are mechanosensitive. This is now more extensively discussed in the discussion section of the manuscript page 13:

      “Indeed, the electrophysiology methods used here can only identify sensory afferents that have a mechanosensitive receptive field. Primary afferents that have an axon in the skin but no mechanosensitvity can only be identified with a so-called electrical search protocol (45, 46) which was not used here. It is therefore quite likely that many primary afferents that failed to form endings would not be recorded in these experiments e.g. SA-LTMRs and RA-LTMRs that fail to innervate end-organs (Fig.4-6).”

      “From our data, we could not conclude whether SA-LTMR electrophysiological responses are differentially affected in the glabrous versus hairy skin of Meis2 mutant as suggested by histological analysis. Further electrophysiological analysis focused on SA-LTMR selectively innervating the glabrous or hairy skin would be necessary to answer this question. Similarly, the decreased sensitivity of Meis2 mutant mice in the cotton swab assay and the morphological defects of Meissner corpuscles evidenced in histological analysis do not correlate with RA-LTMR electrophysiological responses for which a tendency to decreased responses were however measured. The later might result from an insufficient number of fibers recording, whereas the first may be due of pooling SA-LTMR from both the hairy and glabrous skin.”.

      Reviewer #2 (Public Review):

      Summary:

      Desiderio and colleagues investigated the role of the TALE (three amino acid loop extension) homeodomain transcription factor Meis2 during maturation and target innervation of mechanoreceptors and their sensation to touch. They start with a series of careful in situ hybridizations to examine Meis2 transcript expression in mouse and chick DRGs of different embryonic stages. By this approach, they identify Meis2+ neurons as slowly- and rapidly adapting A-beta LTMRs, respectively. Retrograde tracing experiments in newborn mice confirmed that Meis2-expressing sensory neurons project to the skin, while unilateral limb bud ablations in chick embryos in Ovo showed that these neurons require target-derived signals for survival. The authors further generated a conditional knock-out (cKO) mouse model in which Meis2 is selectively lost in Islet1-expressing, postmitotic neurons in the DRG (IsletCre/+::Meis2flox/flox, abbreviated below as cKO). WT and Islet1Cre/+ littermates served as controls. cKO mice did not exhibit any obvious alteration in volume or cellular composition of the DRGs but showed significantly reduced sensitivity to touch stimuli and various innervation defects to different end-organ targets. RNA-sequencing experiments of E18.5 DRGs taken from WT, Islet1Cre/+, and cKO mice reveal extensive gene expression differences between cKO cells and the two controls, including synaptic proteins and components of the GABAergic signaling system. Gene expression also differed considerably between WT and heterozygous Islet1Cre/+ mice while several of the other parameters tested did not. These findings suggest that Islet1 heterozygosity affects gene expression in sensory neurons but not sensory neuron functionality. However, only some of the parameters tested were assessed for all three genotypes. Histological analysis and electrophysiological recordings shed light on the physiological defects resulting from the loss of Meis2. By immunohistochemical approaches, the authors describe distinct innervation defects in glabrous and hairy skin (reduced innervation of Merkel cells by SA1-LTMRs in glabrous but not hairy skin, reduced complexity of A-beta RA1-LTMs innervating Meissner's corpuscles in glabrous skin, reduced branching and innervation of A-betA RA1-LTMRs in hairy skin). Electrophysiological recordings from ex vivo skin nerve preparations found that several, but not all of these histological defects are matched by altered responses to external stimuli, indicating that compensation may play a considerable role in this system.

      Strengths:

      This is a well-conducted study that combines different experimental approaches to convincingly show that the transcription factor Meis2 plays an important role in the perception of light touch. The authors describe a new mouse model for compromised touch sensation and identify a number of genes whose expression depends on Meis2 in mouse DRGs. Given that dysbalanced MEIS2 expression in humans has been linked to autism and that autism seems to involve an inappropriate response to light touch, the present study makes a novel and important link between this gene and ASD.

      Weaknesses:

      The authors make use of different experimental approaches to investigate the role of Meis2 in touch sensation, but the results obtained by these techniques could be connected better. For instance, the authors identify several genes involved in synapse formation, synaptic transmission, neuronal projections, or axon and dendrite maturation that are up- or downregulated upon targeted Meis2 deletion, but it is unresolved whether these chances can in any way explain the histological, electrophysiological, or behavioral deficits observed in cKO animals. The use of two different controls (WT and Islet1Cre/+) is unsatisfactory and it is not clear why some parameters were studied in all three genotypes (WT, Islet1Cre/+ and cKO) and others only in WT and cKO. In addition, Meis2 mutant mice apparently are less responsive to touch, whereas in humans, mutation or genomic deletion involving the MEIS2 gene locus is associated with ASD, a condition that, if anything, is associated with an elevated sensitivity to touch. It would be interesting to know how the authors reconcile these two findings. A minor weakness, the first manuscript suffers from some ambiguities and errors, but these can be easily corrected.

      We thank the reviewer for the insightful comments and suggestions.

      The use of two different controls (WT and Islet1Cre/+) is unsatisfactory and it is not clear why some parameters were studied in all three genotypes (WT, Islet1Cre/+ and cKO) and others only in WT and cKO.

      First, we identified a labelling mistake in figures 4D, 5A and 6A where the control shown are from Islet1+/Cre mice and not from WT as reported in the first version. We apologize for this mistake which has now been corrected. This typographical error does not in any way affect our conclusion, on the contrary, it shows that innervation defects are not the consequence of Islet1 heterozygosity.

      The reviewer wonders why for some data both control genotypes are presented, and for some others only one is presented. It is quite possible that genes expression changes happen due to a synergistic effect of both heterozygous Meis2 deletion and heterozygous Islet1 deletion. However, we found no evidence that this led to defects in target-field innervation or to changes in the physiological properties of sensory neurons.

      Whereas it could be fairly envisaged that some gene expression is modified due to a synergistic effect of both heterozygous Meis2 deletion and heterozygous deletion of Islet1, several lines of evidence support that the defects in target-field innervation and electrophysiological responses are exclusively due to Meis2 deletion. Previous work on Islet1 specific deletion in DRG sensory neurons opens the possibility that some of the phenotypes we report here are in part due to an effect of Islet1 heterozygous deletion or a synergistic effect to Meis2 homozygous deletion.

      1) When Islet1 is conditionally deleted in mice using the Wnt1-Cre strain or at later stages using a tamoxifen inducible-Cre, homozygous pups die a few hours after birth. Early Islet1 deletion results in an increased apoptosis in the DRG, a massive loss of DRG sensory neurons and sensory defects associated to nociceptors mostly and some touch neurons while proprioceptive neurons are spared (Sun et al., 2008 now included in the revised version of the manuscript). There was a decrease in the number of Ntrk1+ and Ntrk2+ neurons whereas Ntrk3+ neurons number appeared normal. When Islet1 is inactivated later in development, the number of Ntrk1+ and Ntrk2+ neurons were normal and only the expression of nociceptor specific markers was decreased. Since neither the DRG volume, nor the number of Ntrk1+, Ntrk2+ and Ntrk3+ neurons are changed in Meis2 cKO using the Islet1-Cre strain, an early significant effect of Islet1 heterozygous deletion is very unlikely.

      2) For distal innervation defects, it is clear from the Wnt1-Cre::Meis2 data (Figure 3E) that the distal innervation phenotype occurred while Meis2 is inactivated independently of Islet1 expression.

      3) Finally, the lack of differences between WT and Islet+/Cre mice in behavioral assays and in electrophysiological characterization of RA-LTMR of the hairy skin (Figure 6C) and SA-LTMR (Figure 4B and C) argues for a lack of significant consequences of Islet1 heterozygous deletion on these parameters.

      4) For bulk RNAseq studies, all datasets has been now re-analyzed following Reviewer 2 specific comments (see below). To avoid misinterpretation of the data, the results are now presented differently (see pages 8 and 9) and more critically discussed (see pages 14 and 15). In particular, we included and discuss references on Islet1 cKO mice.

      We also agree with reviewer 2 that our RNAseq study only provides cues on potential genes expression that could impact distal innervation and electrophysiological responses. However, proving which of those genes are fully responsible for the morphological and electrophysiological defects would require extensive mouse genetic investigations such as restoring their normal expression level in a Meis2 mutant context, which is beyond the scope of the present study.

      Finally, the reviewer questioned how we could reconcile the lower touch sensitivity in Meis2 mutant mice with the exacerbated touch sensitivity found in ASD patient and mouse models of ASD. As suggested by reviewer 1, our study did not really investigate ASD specifically. Therefore, to avoid over interpretation of the data and to follow Reviewer 1 recommendation, we have removed all references to ASD in the revised version of the manuscript. Indeed, to our knowledge, none of the case reports on Meis2 mutant patients investigated sensory function in general and light touch in particular, maybe because of the severe intellectual disability characterizing these patients.

      Reviewer #1 (Recommendations For The Authors):

      In addition to the aforesaid suggestions in the section 2, there are some minor issues:

      We thank the reviewer for the careful reading and for identifying all these typos. All of them have been corrected in the revised version of the manuscript.

      1) There should not be a full stop mark in the title of the article. This has been corrected in the new version of the manuscript.

      2) Figure 1C, 1D, please correct the typo "controlateral' to "contralateral".

      This has been corrected in the new version of the manuscript.

      3) Figure 1D, lower graph, Y-axis, please correct the typo 'umber' to "number".

      This has been corrected in the new version of the manuscript.

      4) To make it easy for readers, add the names of the behavioral tests on top of the graphs in Fig 1E-H.

      The name of behavioral tests is now added to the figure.

      5) It would be easier to read the markers' names in IHC and ISH images if they were written outside of image panels. The blue staining color in image 1B could be easily mixed with the background. Suggest change colors.

      Markers for IHC and IH images are now written outside the image panel or colors have been change in figure 1 and 2 for better clarity.

      6) The font size of Genes' name in Figure 3B is too small and not readable.

      Figure 3 has now been changed following Reviewer 2 recommendation. The small font size in Figure 3B is no longer present in the figure.

      7) Quantification of Fig 3E (number of fibers innervating each dermal papilla or footpad, for example).

      Unfortunately, we did not kept the Wnt1Cre::Meis2LoxP/LoxP strain which prevents further analysis (see onset of the answer to public review).

      8) In Figure 4, please arrange IHC images and their quantification results adjacent to each other.

      The figure has been reorganized and changes in the result section and figures legend were made accordingly.

      9) For consistency, please use either LTMR or LTM (See Figure 4F, 5A, 6C), but not both.

      This has been homogenized throughout the manuscript.

      10) Add arrows/heads to mark the overlaps in Figure 4D.

      Arrows are now added in Figure 4D to point at the overlap between Nefh and CK8 staining.

      11) Figure 5A, 6A, Lines 236, 240, 247, 258, 305, 308, 313, 347, and many more in Figure legends: please check in entire manuscript and make the mouse genotype nomenclature (+/Cre?) consistent. In some places, Cre is written in all upper case (Line 657).

      This has been homogenized throughout the manuscript.

      12) Figure 4G: Histogram color could be darker for better contrast.

      The color of the histograms has been changes in figures 6 and 5 for better clarity.

      13) Please add the figure number to the Figure 6.

      The figure number is now indicated on the figure.

      1. Figure 6B: Y-axis typo, correct "Nfeh" to Nefh.

      This typo is now corrected.

      15) Either explain Figure 2B information before that of Figure 2C (In lines 204-207) in the text or change the figure panel sequence to keep the consistent flow of contents.

      The figure has been modified and the panel sequence now follows that of the main text.

      16) Line 213 has a typo: change "form" to "from".

      This typo is now corrected.

      17) Line 423 has a typo. Correct "al" to "all".

      This typo is now corrected.

      18) Line 625 has a typo. Correct "fo" to "of".

      This typo is now corrected.

      19) Line 669 has a typo. Correct "Alexa Fluo" to "Fluor".

      This typo is now corrected.

      20) Line 744: To be consistent in the entire manuscript, write "Nfh" as "Nefh".

      This typo is now corrected.

      21) 740-749: Please add host names for all primary antibodies, as some are given but some are not for the current version.

      We now indicated the host species for all primary antibodies used in the study.

      22) Line 751 has a typo: change "a" to "as".

      This typo is now corrected.

      23) Line 754: what is for 20'?

      This typo is now corrected.

      24) Line 832: change "day test" to "testing day".

      The change has been made.

      25) Please mention for how many seconds the VFH was administered on the plantar surface in the method.

      A new sentence has been added to the “Von Frey withdrawal test” Methods section (page 30): “During each application, bend filament was maintained for approximately four to five seconds”.

      26) For the sticky tape test, in lieu of hind paw attending bouts, wet-dog shake behavior, the authors also found some scratching behaviors. Did they separately quantify these behaviors? It would be interesting to see exactly which behavior significantly reduced after Meis2 inactivation.

      Unfortunately, at the time of the design of the sticky tape test, we did not consider separating the behaviors considered as “positive” reactions. As these experiments were not video recorded, we are not able to extract this kind of information without generating new mice cohort and repeating this experiment.

      27) Line 344-345: consider rephrasing the sentence.

      This sentence has been removed.

      Reviewer #2 (Recommendations For The Authors):

      This is a beautiful and well-conducted study with all the strengths listed in the paragraphs above. Nevertheless, there are still some open questions, ambiguities in the presentation, and minor errors that I would recommend addressing.

      Major Points:

      1) The authors performed RNA-seq analysis from E18.5 mouse total DEGs from three different genotypes, WT, Isle1Cre/+ and cKO. Although this approach identified several interesting Meis2-dependent candidate genes, the presentation of the results is confusing, and the publication would gain impact if the RNA-seq results were better connected to the histological, behavioral, and electrophysiological data. Specific concerns:

      1.1) The gene expression profiles of WT and Islet1Cre/+ samples are remarkably divergent. According to Yang Development 2006, Islet1-Cre was generated by knocking in Cre into the endogenous Islet1 locus and replacing the Isl1 ATG, hence resulting in a heterozygous null for Islet1. When purely technical derivations can be excluded, the RNAseq results presented here suggest that heterozygous loss of Islet1 causes considerable gene expression changes in the postnatal DRG. For analysis of the RNAseq results, the authors focus on genes that are differentially expressed between one experimental condition (Islet1Cre/+::Meis2flox/flox) and either one of two controls (WT or Islet1Cre/+). Hence, they pool the genes that are differently expressed between cKO and Islet1Cre/+ with the genes that are different between cKO and WT. This approach mixes gene expression differences that result from two different genetic alterations, heterozygosity of Islet1 and targeted deletion of Meis2, respectively. It seems much more logical to compare the results pairwise.

      We agree with reviewer 2 that heterozygous deletion of Islet1 causes a significant change in genes expression that seems to very little correlate with any of the phenotypes we investigated in the study. When Islet1 is conditionally deleted in mouse using the Wnt1-cre strain, pups die few hours after birth and display increased apoptosis in the DRG, massive loss of DRG sensory neurons and sensory defects associated to nociceptors mostly and some touch neurons while proprioceptive neurons are spared (Sun et al., 2008 now included in the revised version of the manuscript). There is a decrease numbers of Ntrk1+ and Ntrk2+ neurons whereas the numbers of Ntrk3+ neurons appear normal. Later Isl1 inactivation does not induces changes in number of neurons and does not change Ntrk1 and 2 expressions. As explained in the answer to public reviews, bulk RNAseq data have now been reanalyzed following the reviewer suggestions and presented accordingly in the related figures.

      In the study bay Sun et al. they also reported DEGs following Islet1 homozygous deletion, but data on Islet1 heterozygous deletion are not included. However, out of the 60 most dysregulated genes identified in their study, only 6 were differentially expressed in our datasets. Importantly, DEGs in their studies where identified using microarray. In another study, the same group, showed that Brn3a (another transcription factor important for DRG neurons differentiation) and Islet1 exhibit negative epistasis on sensory genes expression (Dykes et al., 2011 now included in the revised version of the manuscript). Thus we cannot rule out that similar rules apply for Islet1 and Meis2. However, given the high diversity of DRG sensory neurons, interpreting our bulk RNAseq analysis in such direction might lead to misinterpretation.

      1.2) Along the same line, gene expression changes in Islet1Cre/+ DRGs seem to have little functional consequences, at least in the cases where all three genotypes were analyzed (target dependency (Fig. 1E), behavior (Fig. 1F), innervation (Fig. 4F, 6C)). Why were some parameters measured in all three genotypes and others only for WT and cKO? The authors probably reason that parameters that do not differ between WT and cKO animals will likely also not differ between WT and Islet1Cre/+. But what about parameters that do differ? Considering that the innervation of Merkel cells (Fig. 4E) and Meissner corpuscles (Fig. 5A) differ profoundly between WT and cKO, it would be interesting to know what this innervation looks like in Islet1Cre/+ DRGs. NEFH staining together with CK8 or S100beta from existing tissue sections should easily answer this question.

      As explained in the answer for public reviews, there was a mistake in the annotation of the control in figure 4 D and E, and in Fig. 5 that has now been corrected. Concerning target-dependency, those are experiments conducted in chick embryo, and therefore no associated genotype.

      1.3) Was a minimum cut-off for gene expression applied? The up-and downregulated genes in Fig. 3B list a number of pseudogenes and predicted genes. A quick (and incomplete) check for their expression in Fig2 Supple Table 1 shows that only a few reads were detected for most of them. With such low expression, even small changes will show up as significant differences.

      In our first analysis, a cut-off of 10 reads was applied. As reviewer 2 mentioned, this cut-off included several pseudogenes and predicted genes with low expression for which small changes were significant. We now re-analyzed the dataset using a cut-off of 100 reads. This excluded most of the previous predicted genes and pseudogenes for the analysis and resulted in a much small number of DEGs for each dataset. As recommended by reviewer 2, we also now performed the David analysis separately. These results are now presented in Figure 3 and corresponding supplementary figures.

      1.4) Given that bulk RNAseq from whole embryonic DRGs was performed, it would be interesting to know what cell type(s) express the Meis2-dependent transcripts. To address this question, the authors resort to published scRNAseq data by Usoskin Nat Neurosci 2015. They correlate the expression of all 488 DEGs (different between cKO and either WT or Islet1Cre/+) with the expression of Meis2 in the sensory neuron subtypes that were classified in the Usoskin paper. From that they conclude that many Meis2-dependent genes were expressed in the same sensory neuron classes as Meis2 itself. This is not apparent from Fig. 3 Supplementary 2. Neither do the 488 DEGs seem to be in any way enriched in the MEIS2-expressing cell clusters NF2/3/4/5, nor is cluster PEP1 particularly high in Meis2 expression. Immunostaining for MEIS2 together with a few selected DEGs would be a better way to assess co-expression.

      We agree with reviewer 2 that the correlation between DEGs and the expression of Meis2 in the sensory neuron subtypes was far from striking. In our opinion, the new analysis shows now a more robust correlation. However, it has to be kept in mind that among DEGs not all are expected to be Meis2 direct target genes and therefore to be enriched in the same Meis2-expressing population. This also hold true for genes that could be de-repressed or induced following Meis2 inactivation. Finally, the scRNAseq by Usoskin et al was performed on adult sensory neurons whereas our bulk RNAseq was performed on E18.5 embryos. Thus, because gene expression in developing sensory neurons is well-known to be highly dynamic, it is not expected that the transcriptional signature of sensory neurons subclasses in E18.5 embryo perfectly matches the transcriptional signature of adult subclasses. Finally, we agree that immunostaining for Meis2 together with few selected DEGs would give a better answer on whether they co-localize or not, but our lack of experience with those antibodies together with the lack of financial support for the proposal precludes achieving this pertinent point.

      1.5) The authors identify Gabra1 and Gabra4 as upregulated and Gabrr1 as downregulated genes in MEIS2 cKO animals. Does this reflect a change in GABA-receptor subunit composition in LMTRs?

      This is an interesting point. First, in our new analysis, increasing the cut-off to 100 reads excluded Gabrr1 from the DEGs. Based on our results, we cannot conclude whereas Gabra1 and Gabra4 up-regulation reflects a change in GABA receptors composition. However, in the GEO term associated to Gabaergic synapse, whereas Gabra1 and Gabra4 were up-regulated the ionotropic glutamate receptor Grid1 was downregulated, rather claiming for an imbalanced GABA/Glutamate transmission. Finally, the increased GABAR expression in the LTMRs might be expected to increase pre-synaptic inhibition on the LTMR synapses onto target neurons in the dorsal horn, thus decreasing synaptic transmission from these neurons into spinal circuits.

      2) The authors assessed SA-LTMR innervating Merkel cells in glabrous and hairy skin by IFC staining for neurofilament H and electrophysiological recordings. Due to the small sample size, they pooled recordings, reasoning that nerves that do not successfully innervate Merkel cells (i.e. cKO glabrous skin) do not evoke electrophysiological responses following a touch stimulus.

      2.1) It is undoubtedly true that non-innervating nerves will likely not show electrophysiological responses. However, by pooling the recordings of SA-LTMRs from glabrous and hairy skin, the data obtained from the 20% successful recordings of SA-LTMRs from glabrous cKO skin (according to Fig. 4E, upper panel) will be overrepresented and hence lead to a systematic bias. How many recordings were made from the glabrous and hairy skin of each genotype? In case the number of recordings from cKO/glabrous skin is the limiting factor, does the observed difference in vibration threshold hold true when only recordings from hairy skin are compared?

      As explained in the text and in our answers to reviewer 1, data for hairy and glabrous SAMs where initially pooled as no differences between them were expected, and next planned electrophysiological experiments were compromised due to the Covid19 pandemic. We are sorry that at this point, we cannot provide additional experiments to clarify this important point. In addition, as mention

      3) From the IFC images shown in Fig. 6A, it is not clear how the authors quantified branch points and innervated hair follicles.

      Branch points correspond to every time a nerve split in 2 or more nerves. Innervated follicles correspond to follicles that are entangled by circumferential and/or lanceolate Nefh+ endings.

      4) The quality of the data is very high, but there are several ambiguities and errors in their presentation.

      We apologize for this mistake. Figure 1 Supplementary 1 that reports data from Cat walk analysis is now appropriately included in the files.

      4.2) Fig. 3A is confusing and the figure legend just repeats what is already said in the text. What do yellow, blue, and pink represent?

      Figure 3 is now fully remade. Legend is now better indicated in Figure 3A. We hope it is now more clear.

      4.3) What genotype do the black, grey, and white boxplots in Fig. 6C Fig. 3 Supplementary 1B correspond to?

      The legends were missing for Figure 6C and Figure 3 supplementary 1B. They are now appropriately included.

      4.4) Up- and downregulated genes are assigned differently in Fig. 3 and Fig. 3 Supplementary 2. The figure legend of Fig. 3 Suppl 2 lists panel B as up-regulated genes but the same genes are labeled down-regulated in Fig. 3.

      We apologize for this previous mistake. Figure 3 and corresponding supplementary figures have been redone in the new version.

      4.5) Fig. 3E would benefit from a more detailed description. One can easily appreciate that the neurofilament H staining in the cKO sample is different from that of the WT sample but what exactly can be seen here?

      We added the following sentence in the results section: “In WT newborn mice, numerous Nefh+ sensory fibers surround all dermal papillae of the hairy skin and footpad of the glabrous skin, whereas in Wnt1Cre::Meis2LoxP/LoxP littermates, very few Nefh+ sensory fibers are present and they poorly innervate the dermal papillae and footpads.“.

      4.6) The figure legend to Fig. 4A is unclear. Does the graph show the sum of all recordings performed? From the text, one would guess that the bars correspond to the cKO samples, but this is not specified. Do the controls correspond to WT, Islet1Cre/+ or a mixture of both? In addition, the graph in the lower panel is labeled % Ab fibers, the figure legend reads % of tap units among Ab fibers.

      The graphs show the number of tap units identified among all recorded Afibers. Numbers show the number of tap units over the number of recorded fibers. This as been now reformulated in the last version of the manuscript.

      4.7) The abbreviation SAM in figure legends 4F, G is not introduced.

      This is now indicated in the figure legend.

      4.8) Readers who are not familiar with the traces above the graphs in 4F and 4G will find a more detailed description helpful.

      This is now indicated in the figure legend.

      4.9) Lines 274-275: Does the statement "Finally, consistent with the lack of neuronal loss in Isl1Cre/+::Meis2LoxP/LoxP, the number of recorded fibers were identical in WT and Isl1Cre/+::Meis2LoxP/LoxP." refer to Fig. 4G? This is not specified in the text.

      These data were not included in the first version of the manuscript as we though they were not significantly informative. They just indicate the overall numbers of fibers that were recorded in electrophysiological experiments. The sentence has been now removed in the last version of the manuscript to avoid misunderstanding.

      4.10) There is no Fig. 6 supplementary 1.

      The typo is now corrected. The corresponding data were in fact in Figure 5 Supplementary 1.

      Minor points:

      • Gangfuß et al. report that a patient previously diagnosed with a range of neurological deficits including the diagnosis of severe infantile autism is heterozygous mutant for MEIS2. Although this study links MEIS2 gene function to ASD in the wider sense, adding a few additional references will make the link stronger. Examples are Shimojima et al., Hum Genome Var 2017 or Bae et al., Science 2022.

      These two references have been now included in the introduction section of the manuscript.

      • In some figures (e.g. Fig. 4) the numbering of the panels does not follow the order in which the respective data are mentioned in the text.

      Figure 4 is now re-organized so that panels follow the same order as in the results section.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Nitrogen metabolism is of fundamental importance to biology. However, the metabolism and biochemistry of guanidine and guanidine containing compounds, including arginine and homoarginine, have been understudied over the last few decades. Very few guanidine forming enzymes have been identified. Funck et al define a new type of guanidine forming enzyme. It was previously known that 2-oxogluturate oxygenase catalysis in bacteria can produce guanidine via oxidation of arginine. Interestingly, the same enzyme that produces guanidine from arginine also oxidises 2-oxogluturate to give the plant signalling molecule ethylene. Funck et al show that a mechanistically related oxygenase enzyme from plants can also produce guanidine, but instead of using arginine as a substrate, it uses homoarginine. The work will stimulate interest in the cellular roles of homoarginine, a metabolite present in plants and other organisms including humans and, more generally, in the biochemistry and metabolism of guanidines.

      1) Significance

      Studies on the metabolism and biochemistry of the small nitrogen rich molecule guanidine and related compounds including arginine have been largely ignored over the last few decades. Very few guanidine forming enzymes have been identified. Funck et al define a new guanidine forming enzyme that works by oxidation of homoarginine, a metabolite present in organisms ranging from plants to humans. The new enzyme requires oxygen and 2oxogluturate as cosubstrates and is related, but distinct from a known enzyme that oxidises arginine to produce guanidine, but which can also oxidise 2-oxogluturate to produce the plant signalling molecule ethylene.

      Overall, I thought this was an exceptionally well written and interesting manuscript. Although a 2-oxogluturate dependent guanidine forming enzyme is known (EFE), the discovery that a related enzyme oxidises homoarginine is really interesting, especially given the presence of homoarginine in plant seeds. There is more work to be done in terms of functional assignment, but this can be the subject of future studies. I also fully endorse the authors' view that guanidine and related compounds have been massively understudied in recent times. I would like to see the possibility that the new enzyme makes ethylene explored. Congratulations to the authors on a very nice study.

      Response: We thank the reviewer for the positive evaluation of our manuscript. In the revised version, we have emphasized more clearly that we found no evidence for ethylene production by the recombinant enzymes. The other suggestions of the reviewer are also considered in the revised version as detailed below.

      Reviewer #2 (Public Review):

      In this study, Dietmar Funck and colleagues have made a significant breakthrough by identifying three isoforms of plant 2-oxoglutarate-dependent dioxygenases (2-ODD-C23) as homo/arginine-6-hydroxylases, catalyzing the degradation of 6-hydroxyhomoarginine into 2aminoadipate-6-semialdehyde (AASA) and guanidine. This discovery marks the very first confirmation of plant or eukaryotic enzymes capable of guanidine production.

      The authors selected three plant 2-ODD-C23 enzymes with the highest sequence similarity to bacterial guanidine-producing (EFE) enzymes. They proceeded to clone and express the recombinant enzymes in E coli, demonstrating capacity of all three Arabidopsis isoforms to produce guanidine. Additionally, by precise biochemical experiments, the authors established these three 2-ODD-C23 enzymes as homoarginine-6-hydroxylases (and arginine-hydroxylase for one of them). Furthermore, the authors utilized transgenic plants expressing GFP fusion proteins to show the cytoplasmic localization of all three 2-ODD-C23 enzymes. Most notably, using T-DNA mutant lines and CRISPR/Cas9-generated lines, along with combinations of them, they demonstrate the guanidine-producing capacity of each enzyme isoform in planta. These results provide robust evidence that these three 2-ODD-C23 Arabidopsis isoforms are indeed homoarginine-6-hydroxylases responsible for guanidine generation.

      The findings presented in this manuscript are a significant contribution for our understanding of plant biology, particularly given that this work is the first demonstration of enzymatic guanidine production in eukaryotic cells. However, there are a couple of concerns and potential ways for further investigation that the authors should (consider) incorporate.

      Firstly, the observation of cytoplasmic and nuclear GFP signals in the transgenic plants may also indicate cleaved GFP from the fusion proteins. Thus, the authors should perform Western blot analysis to confirm the correct size of the 2-ODD-C23 fusion proteins in the transgenic protoplasts.

      Secondly, it may be worth measuring pipecolate (and proline?) levels under biotic stress conditions (particularly those that induce transcript changes of these enzymes, Fig S8). Given the results suggesting a potential regulation of the pathway by biotic stress conditions (eg. meJA), these experiments could provide valuable insights into the physiological role of guanidine-producing enzymes in plants. This additional analysis may give a significance of these enzymes in plant defense mechanisms.

      Response: We thank also reviewer 2 for the positive evaluation and useful suggestions. We performed the proposed GFP Western blot, which indeed indicated the presences of both, fulllength fusion proteins and free GFP, which can explain the partial nuclear localization. We fully agree that further experiments with biotic and abiotic stress will be required to determine the physiological function of the 2-ODD-C23 enzymes. However, the list of potential experiments is long and they are beyond the scope of the present manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Specific points

      Overall, I thought this was a very interesting study, comprising biochemical, cellular, and in vivo studies. Of course more could be done on each of these, and likely will be, but I think the assignment of biochemical function is very strong, across all three approaches. The one new experiment I would like to see is a clear demonstration of whether ethylene is produced - unlikely but should be tested.

      We had mentioned our failure to detect ethylene production by the plant enzymes in the previous version and have made it more prominent and reliable by including ethylene production as positive control in the new supplementary figure S5.

      Abstract

      Delete 'hitherto overlooked' - this is implicit 'but is more likely' to 'is likely'?

      Agreed and modified

      Introduction

      Second sentence - what about relevant small molecule primary metabolites including precursors of proteins/nucleic acids.

      We modified the sentence accordingly.

      Paragraph 2 - maybe also note EFE produces glutamate semi aldehyde, via arginine C-5 oxidation.

      Paragraph 2 has been re-phrased according to your suggestion.

      Overall, I thought the introduction was exceptionally well written.

      Perhaps either in the introduction, or later, note there are other 2OG oxygenases that oxidise arginine/arginine derivatives in various ways, e.g. clavaminate synthase/arginine hydroxylases/desaturases.

      We added a sentence mentioning the arginine hydroxylases VioC and OrfP to the introduction and included VioC into the sequence comparison in supplementary figure 2 to show that these enzymes, as well as NapI, are very different from EFE and the plant hydroxylases.

      Results

      Paragraph 1 - qualify similarity and refer to/give a structurally informed sequence alignment, including EFE

      A new supplemental figure S2 was added with sequence identity values and a structurally informed alignment. The text has been modified accordingly.

      Paragraph 2 - briefly state method of guanidine analysis

      We included a reference to the M&M section and mentioned LC-MS in paragraph 2.

      Figure 1 - trivial point - proteins are not expressed/genes are

      We have modified the legend to figure 1. However, we would like to point out that terms like “recombinant protein expression” are widely used in the field. A quick search with google Ngram viewer shows that “protein expression” started to appear in the mid-80ies and its use stayed constantly at 1/8th of “gene expression”.

      Define errors clearly in all figure legends, clearly defining biological/technical repeats<br /> Page 6 - was the His-tag cleared to ensure no issues with Ni contamination?

      We treat individual plants or independent bacterial cultures as biological replicates. Only in the case of enzyme activity assays with NAD(P)H, technical replicates were used and this has been indicated in the legend of figure 6.

      Lower case 'p' in pentafluorobenzyl corrected

      In Figure 2 make clear the hydroxylated intermediates are not observed

      We now use grey color for the intermediates and have put them in brackets. Additionally we state in the figure legend that these intermediates were not detected.

      Pages 6-7 - I may have missed this but it's important to investigate what happens to the 2OG. Is succinate the only product or is ethylene also produced? This possibility should also be considered in the plant studies, i.e. is there any evidence for responses related to perturbed ethylene metabolism. The authors consider a signalling role relating to AASA/P6C, but seem to ignore a potential ethylene connection.

      As stated above, we checked for ethylene production with negative result. EFE produced 6 times more guanidine than the plant enzymes under the same condition, but even 100-fold lower ethylene production would have been clearly detected.

      Page 12 - 'plants have been shown to....' Perhaps note how hydroxy guanidine is made?

      We now mention the canavanine-γ-lyase that cleaves canavanine into hydroxyguanidine and homoserine.

      Overall, I thought the discussion was good, but perhaps a bit long/too speculative on pages 12/13 and this detracted from the biochemical assignment of the enzyme. I'd suggest shortening the discussion somewhat - the precise roles of the enzyme can be the subject of future work. As indicated above, some discussion on potential links to ethylene would be appreciated.

      Since reviewer 2 wanted more (speculative) discussion on the role of the 2-ODD-C23 enzymes and there was no detectable ethylene production, we took the liberty to leave the discussion largely unaltered.

      I'd also like to see some more consideration/metabolic analyses of guanidine related metabolism in the genetically modified plants.

      Such analyses will certainly be included in future experiments once we get an idea about the physiological role of the 2-ODD-C23 enzymes.

      Page 16 - mass spectrometry

      Corrected.

      Please add a structurally informed sequence alignment with EFE and other 2OG oxygenases acting on arginine/derivatives.

      An excerpt of the alignment is now presented in supplementary figure S2.

      Reviewer #2 (Recommendations For The Authors):

      I would like to see more discussion in the manuscript about the possible interconnection/roles between 2-ODD-C23 guanidine-producing, lysine- ALD1-Pipecolate producing, and proline metabolism pathways during both biotic and abiotic stresses.

      Since we were unable to detect pipecolate in any of our plant samples and also our preliminary results with biotic stress did not produce any evidence for a function of the 2ODD-C23 enzymes in the tested defense responses, we would like to postpone such extended discussion until we find a condition where the physiological function of these enzymes is evident.

      Fig. 4: Authors should change colors for Col-0, 0.2 HoArg and ctrl? They look too similar in my pdf file.

      We changed the colors in figure 4 and hope that the enhanced contrast is maintained during the production of the final version of our article.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides a fundamental contribution to the understanding of the role of intrinsically disordered proteins in circadian clocks and the potential involvement of phase separation mechanisms. The authors convincingly report on the structural and biochemical aspects and the molecular interactions of the intrinsically disordered protein FRQ. This paper will be of interest to scientists focusing on circadian clock regulation, liquid-liquid phase separation, and phosphorylation.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      "Phosphorylation, disorder, and phase separation govern the behavior of Frequency in the fungal circadian clock" is a convincing manuscript that delves into the structural and biochemical aspects of FRQ and the FFC under both LLPS and non-LLPS conditions. Circadian clocks serve as adaptations to the daily rhythms of sunlight, providing a reliable internal representation of local time.

      All circadian clocks are composed of positive and negative components. The FFC contributes negative feedback to the Neurospora circadian oscillator. It consists of FRQ, CK1, and FRH. The FFC facilitates close interaction between CK1 and the WCC, with CK1-mediated phosphorylation disrupting WCC:c-box interactions necessary for restarting the circadian cycle.

      Despite the significance of FRQ and the FFC, challenges associated with purifying and stabilizing FRQ have hindered in vitro studies. Here, researchers successfully developed a protocol for purifying recombinant FRQ expressed in E. coli.

      Armed with full-length FRQ, they utilized spin-labeled FRQ, CK1, and FRH to gain structural insights into FRQ and the FFC using ESR. These studies revealed a somewhat ordered core and a disordered periphery in FRQ, consistent with prior investigations using limited proteolysis assays. Additionally, p-FRQ exhibited greater conformational flexibility than np-FRQ, and CK1 and FRH were found in close proximity within the FFC. The study further demonstrated that under LLPS conditions in vitro, FRQ undergoes phase separation, encapsulating FRH and CK1 within LLPS droplets, ultimately diminishing CK1 activity within the FFC. Intriguingly, higher temperatures enhanced LLPS formation, suggesting a potential role of LLPS in the fungal clock's temperature compensation mechanism.

      Biological significance was supported by live imaging of Neurospora, revealing FRQ foci at the periphery of nuclei consistent with LLPS. The amino acid sequence of FRQ conferred LLPS properties, and a comparison of clock repressor protein sequences in other eukaryotes indicated that LLPS formation might be a conserved process within the negative arms of these circadian clocks.

      In summary, this manuscript represents a valuable advancement with solid evidence in the understanding of a circadian clock system that has proven challenging to characterize structurally due to obstacles linked to FRQ purification and stability. The implications of LLPS formation in the negative arm of other eukaryotic clocks and its role in temperature compensation are highly intriguing.

      Strengths:

      The strengths of the manuscript include the scientific rigor of the experiments, the importance of the topic to the field of chronobiology, and new mechanistic insights obtained.

      Weaknesses:

      This reviewer had questions regarding some of the conclusions reached.

      Recommendations For The Authors:

      The reviewer has a few questions for the authors:

      1) Concerning the reduced activity of sequestered CK1 within LLPS droplets with FRQ, to what extent is this decrease attributed to distinct buffer conditions for LLPS formation compared to non-LLPS conditions?

      We don’t believe that these buffer conditions significantly influence the change in FRQ phosphorylation by CK1 observed at elevated temperatures. The pH and ionic strength of the buffer are in keeping with physiological conditions (300 mM NaCl, 50 mM sodium phosphate, 10 mM MgCl2, pH 7.5); CK1 autophosphorylation is robust and generally increases with temperature under these conditions (Figure 7B). However, as LLPS increases CK1 autophosphorylation remains high, whereas phosphorylation of FRQ dramatically decreases. In fact, we chose to alter temperature specifically to induce changes in phase behavior under constant buffer conditions. In this way LLPS could be increased, and FRQ phosphorylation evaluated, without altering the solution composition. Thus, we believe that the reduced CK1 kinase activity toward FRQ as a substrate is directly due to the impact of the generated LLPS milieu, i.e. the changes in structural/dynamic properties of FRQ and/or CK1 induced by the effects of being a phase separate microenvironment, which could be substantially different from non-phase separated buffer environment. For example, previous work done on the disordered region of DDX4 [Brady et al. 2017, and Nott et al. 2015] show that even the amount of water content and stability of biomolecules such as double strand nucleic acids encapsulated within the droplets differ between non- and phase separated DDX4 samples.

      Nott T.J. et al. Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell. 2015 57 936-947.

      Brady J.P. et al. Structural and hydrodynamic properties of an intrinsically disordered region of a germ cell-specific protein on phase separation. PNAS 2017 114 8194-8203.

      In the results section we have clarified the use of temperature to control LLPS, “We compared the phosphorylation of FRQ by CK1 in a buffer that supports phase separation under different temperatures, using the latter as a means to control the degree of LLPS without altering the solution composition.”

      On p.16 of the discussion we have elaborated on the above point, “We believe that the reduced CK1 kinase activity toward FRQ as a substrate is directly due to the impact of the generated LLPS milieu, i.e. the changes in structural/dynamic properties of FRQ and/or CK1 induced by the effects of being a phase separate microenvironment, which could be substantially different from non-phase separated buffer environment. For example, previous work done on the disordered region of DDX4 {Brady, 2017 #130;Nott, 2015 #131} show that even the amount of water content and stability of biomolecules such as double strand nucleic acids encapsulated within the droplets differ between non- and phase separated DDX4 samples. Indeed, the spin-labeling experiments indicate that the dynamics of FRQ have been altered by LLPS (Fig. 7D).”

      2) The DEER technique demonstrated spatial proximity between FRH and CK1 when bound to FRQ in the FFC. Is there evidence suggesting their lack of proximity in the absence of FRQ? Also, how important is this spatial proximity to FFC function?

      We have additional data substantiating that FRH and CK1 do not interact in the absence of FRQ. In the revised paper we have included the results of a SEC-MALS experiment showing that FRH and CK1 elute separately when mixed in equimolar amounts and applied to an analytical S200 column coupled to a MALS detector (Figure 1 below and Fig. S8). The importance of the FRH and CK1 proximity is currently unknown, but there are reasons to believe that it could have functional consequences. For example, CK1, as recruited by FRQ, phosphorylates the White-Collar Complex (WCC) in the repressive arm of the circadian oscillator [e.g. He et al. Genes Dev. 20, 2552 (2006); Wang et al, Mol. Cell 74, 771 (2019)]. Interactions between the WCC and the FFC are mediated at least in part by FRH binding to White Collar-2 [Conrad et al. EMBO J. 35, 1707 (2016)]. Thus, FRH:FRQ may effectively bridge CK1 to the WCC to facilitate the phosphorylation of the latter by the former.

      He et al. CKI and CKII mediate the FREQUENCY-dependent phosphorylation of the WHITE COLLAR complex to close the Neurospora circadian negative feedback loop. Genes Dev. 2006 20, 2552-2565.

      Wang B. et al. The Phospho-Code Determining Circadian Feedback Loop Closure and Output in Neurospora Mol. Cell 2019 74, 771-784.

      Conrad et al. Structure of the frequency-interacting RNA helicase: a protein interaction hub for the circadian clock. EMBO J. 2016 35, 1707-1719.

      Author response image 1.

      Size-exclusion chromatography- multiangle light scattering (SEC-MALS) of a mixture of purified FRH and CK1. The proteins elute separately as monomers with no evidence of co-migration.

      3) Is there any indication that impairing FRQ's ability to undergo LLPS disrupts clock function?

      We do not currently have direct evidence that LLPS of FRQ is essential for clock function. These experiments are ongoing, but complicated by the fact that changes to FRQ predicted to alter LLPS behavior also have the potential to perturb its many other clock-related functions that include dynamic interactions with partners, dynamic post-translational modification and rates of synthesis and degradation. That said, the intrinsic disorder of FRQ is important for it to act as a protein interaction hub, and large intrinsically disordered regions (IDRs) very often mediate LLPS, as is certainly the case here. In this work, we argue that the ability of FRQ to sequester clock proteins during the TTFL may involve LLPS. Additionally, we show that the phosphorylation state of FRQ, which is a critical factor in clock period determination, depends on LLPS. Given that the conditions under which FRQ phase separates are physiological in nature and that live-cell imaging is consistent with FRQ phase separation in the nucleus, it seems likely that FRQ does phase separate in Neurospora. Furthermore, given that the sequence features of FRQ that mediate phase-separation are conserved not only across FRQ homologs but also in other functionally related clock proteins, it is probable, albeit worthy of further investigation, that LLPS has functional consequences for the clock. See the response to reviewer 3 for more discussion on this topic.

      Minor Points:

      Indeed, we have included a reference to this paper on p. 3: “Emerging studies in plants (Jung, et al., 2020), flies (Xiao, et al., 2021) and cyanobacteria (Cohen, et al., 2014; Pattanayak, et al., 2020) implicate LLPS in circadian clocks, and in Neurospora it has recently been shown that the Period-2 (PRD-2) RNA-binding protein influences frq mRNA localization through a mechanism potentially mediated by LLPS (Bartholomai, et al., 2022).”

      • On page 9, six lines from the top, please insert "of" between "distributions" and "p-FRQ".

      We have corrected this typo.

      Reviewer #2 (Public Review):

      Summary:

      This study presents data from a broad range of methods (biochemical, EPR, SAXS, microscopy, etc.) on the large, disordered protein FRQ relevant to circadian clocks and its interaction partners FRH and CK1, providing novel and fundamental insight into oligomerization state, local dynamics, and overall structure as a function of phosphorylation and association. Liquid-liquid phase separation is observed. These findings have bearings on the mechanistic understanding of circadian clocks, and on functional aspects of disordered proteins in general.

      Strengths:

      This is a thorough work that is well presented. The data are of overall high quality given the difficulty of working with an intrinsically disordered protein, and the conclusions are sufficiently circumspect and qualitative to not overinterpret the mostly low-resolution data.

      Weaknesses:

      None

      Recommendations For The Authors:

      1)Fig.2B: Beyond the SEC part (absorbance vs elution volume), I don't understand this plot, in particular the horizontal lines. They appear to be correlating molecular weight with normalized absorption at 280 nm, but the chromatogram amplitudes are different. Clarify, or modify the plot. There are also some disconnected line segments between 10-11 mL - these seem to be spurious.

      We apologize for the confusion. The horizontal lines are meant to only denote the average molecular weights of the elution peaks and not correlate with the A280 values. The disconnected lines are the light-scattering molecular weight readouts from which the horizontal lines are derived. The problematic nature of the figure is that the full elution traces and MALS traces across the peaks call for different scales to best depict the relevant features of the data. We have reworked the figure and legend to make the key points more clear.

      2) It could be useful to add AF2 secondary structure predictions, pLDDT, and the helical propensity analysis to the sequence ribbon in Fig.1C.

      Thank you for the suggestion, we have updated the figure to incorporate the pLDDT scores into the linear sequence map, as well as the secondary structure predictions.

      3) Fig.3D: It would be better to show the raw data rather than the fits. At the same time, I appreciate the fact that the authors resisted the temptation to show distance distributions.

      Yes, we agree that it is important to show the raw data; it is included in the supplementary section. Depicting the raw data here unfortunately obscures the differences in the traces and we believe that showing the data as a superposition is quite useful to convey the main differences among the sites. However, we have now explicitly stated in the figure legend that the corresponding raw data traces are given in Figures S5-6.

      4) Fig.5: For all distance distributions, error intervals should be added (typically done in terms of shaded bands around the best-fit distribution). As shown, precision is visually overstated. The error analysis shown in the SI is dubious, as it shows some distances have no error whatsoever (e.g. 6nm in 370C-490C), which is not possible.

      We did previously show the error intervals in the SI, but we agree that it is better to include them here as well, and have done so in the new Figure 5. With respect to the error analysis, we are following the methodology described in the following paper:

      Srivastava, M. and Freed J., Singular Value Decomposition Method To Determine Distance Distributions in Pulsed Dipolar Electron Spin Resonance: II. Estimating Uncertainty. J. Phys Chem A (2019) 123:359-370. doi: 10.1021/acs.jpca.8b07673.

      Briefly, the uncertainty we are plotting is showing the "range" of singular values over which the singular value decomposition (SVD) solution remains converged. For most of the data displayed in this paper we only used the first few singular values (SVs) and the solution remained converged for ± 1 or 2 SVs near the optimum solution. For example, if the optimum solution was 4 SVs then the range in which the solution remained converged is ~3-6 SVs. We plot three lines - lowest range of SVs, highest range of SVs and optimum number of SVs – in the SI figures the optimum SV solution is shown in black and the region between the converged solutions with the highest and lowest number of SVs is shaded in red. Owing to the point-wise reconstruction of the distance distribution, the SVD method enables localized uncertainty at each distance value. Therefore, some points will have high uncertainty, whereas others low. The distance that may appear to have no uncertainty has actually very low uncertainty; which can be seen at close inspection. In these cases, we observe this "isosbestic" type behavior where the P(r) appears to change little across the acceptable solutions and hence there is only a small range of P(r) values at that particular r. This behavior results from multimodal distributions wherein the change in SVs shifts neighboring peaks to lower and higher distances respectively, producing an apparent cancelation effect. What we believe is most important for the biochemical interpretation, and accurately reflected by this analysis, is the general width of the uncertainty across the distribution and how this impacts the error in both the mean and the overall skewing of the distribution at short or long distances.

      Details of the error treatment as described above have been added to the supplementary methods section.

      5) The Discussion (p.13) states that the SAXS and DEER data show that disorder is greater than in a molten globule and smaller than in a denatured protein. Evidence to support this statement (molten globule DEER/SAXS reference data etc.) should be made explicit.

      We will make the statement more explicit by changing it to the following: “Notably, the shape of the Kratky plots generated from the SAXS data suggest a degree of disorder that is substantially greater than that expected of a molten globule (Kataoka, et al., 1997), but far from that of a completely denatured protein (Kikhney, et al., 2015; Martin, Erik W., et al., 2021). Similarly, the DEER distributions, though non-uniform across the various sites examined, indicate more disorder than that of a molten globule (Selmke et al., 2018) but more order than a completely unfolded protein (van Son et al. 2015).”

      van Son, M., et al. Double Electron−Electron Spin Resonance Tracks Flavodoxin Folding, J. Phys. Chem. B 2015, 119, 13507−13514. doi: 10.1021/acs.jpcb.5b00856.

      Selmke, B. et al. Open and Closed Form of Maltose Binding Protein in Its Native and Molten Globule State As Studied by Electron Paramagnetic Resonance Spectroscopy. Biochemistry 2018, 57, 5507−5512 doi: 10.1021/acs.biochem.8b00322.

      6) Fig. S11B could be promoted to the main paper.

      This comment makes a good point. Figure 8 is now an updated scheme, similar to the previous Fig. S11B. Thank you for the suggestion.

      Minor corrections:

      p.1: "composed from" -> "composed of"

      p.2: TFFLs -> TTFLs

      p.2: "and CK1 via" => "and to CK1 via"

      p.5: "Nickel" -> "nickel"

      p.5: "Size Exclusion Chromatography" -> "Size exclusion chromatography"

      p.5: "Multi Angle Light Scattering" -> "multi-angle light scattering"

      Fig.2 caption: "non-phosphorylated (np-FRQ)" -> "non-phosphorylated FRQ (np-FRQ)"

      Fig. S3: What are the units on the horizontal axis?

      Fig. 5H is too small

      Fig. S8, S9: all distance distribution plots show a spurious "1"

      Fig. 6A has font sizes that are too small to read

      p.11: "cytoplasm facing" -> "cytoplasm-facing"

      p.11: "temperature dependent" -> "temperature-dependent"

      p.12: "substrate-sequestration and product-release" -> "substrate sequestration and product release"

      p.12: "depend highly buffer composition" -> "depend highly on buffer composition"

      We thank the reviewer for finding these errors and their attention to detail. All of these minor points have been addressed in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript from Tariq and Maurici et al. presents important biochemical and biophysical data linking protein phosphorylation to phase separation behavior in the repressive arm of the Neurospora circadian clock. This is an important topic that contributes to what is likely a conceptual shift in the field. While I find the connection to the in vivo physiology of the clock to be still unclear, this can be a topic handled in future studies.

      Strengths:

      The ability to prepare purified versions of unphosphorylated FRQ and P-FRQ phosphorylated by CK-1 is a major advance that allowed the authors to characterize the role of phosphorylation in structural changes in FRQ and its impact on phase separation in vitro.

      Weaknesses:

      The major question that remains unanswered from my perspective is whether phase separation plays a key role in the feedback loop that sustains oscillation (for example by creating a nonlinear dependence on overall FRQ phosphorylation) or whether it has a distinct physiological role that is not required for sustained oscillation.

      The reviewer raises the key question regarding data suggesting LLPS and phase separated regions in circadian systems. To date condensates have been seen in cyanobacteria (Cohen et al, 2014, Pattanayak et al, 2020) where there are foci containing KaiA/C during the night, in Drosophila (Xiao et al, 2021) where PER and dCLK colocalize in nuclear foci near the periphery during the repressive phase, and in Neurospora (Bartholomai et al, 2022) where the RNA binding protein PRD-2 sequesters frq and ck1a transcripts in perinuclear phase separated regions. Because the proteins responsible for the phase separation in cyanobacteria and Drosophila are not known, it is not possible to seamlessly disrupt the separation to test its biological significance (Yuan et al, 2022), so only in Neurospora has it been possible to associate loss of phase separation with clock effects. There, loss of PRD-2, or mutation of its RNA-binding domains, results in a ~3 hr period lengthening as well as loss of perinuclear localization of frq transcripts. A very recent manuscript (Xie et al., 2024) calls into question both the importance and very existence of LLPS of clock proteins at least as regards to mammalian cells, noting that it may be an artefact of overexpression in some places where it is seen, and that at normal levels of expression there is no evidence for elevated levels at the nuclear periphery. Artefacts resulting from overexpression plainly cannot be a problem for our study nor for Xiao et al. 2021 as in both cases the relevant clock protein, FRQ or PER, was labeled at the endogenous locus and expressed under its native promoter. Also, it may be worth noting that although we called attention to enrichment of FRQ[NeonGreen] at the nuclear periphery, there remained abundant FRQ within the core of the nucleus in our live-cell imaging.

      Cohen SE, et al.: Dynamic localization of the cyanobacterial circadian clock proteins. Curr Biol 2014, 24:1836–1844, https://doi.org/10.1016/j.cub.2014.07.036.

      Pattanayak GK, et al.: Daily cycles of reversible protein condensation in cyanobacteria. Cell Rep 2020, 32:108032, https://doi.org/10.1016/j.celrep.2020.108032.

      Xiao Y, Yuan Y, Jimenez M, Soni N, Yadlapalli S: Clock proteins regulate spatiotemporal organization of clock genes to control circadian rhythms. Proc Natl Acad Sci U S A 2021, 118, https://doi.org/10.1073/pnas.2019756118.

      Bartholomai BM, Gladfelter AS, Loros JJ, Dunlap JC. 2022 PRD-2 mediates clock-regulated perinuclear localization of clock gene RNAs within the circadian cycle of Neurospora. Proc Natl Acad Sci U S A. 119(31):e2203078119. doi: 10.1073/pnas.2203078119.

      Yuan et al., Curr Biol 78: 102129, 2022. https://doi.org/10.1016/j.ceb.2022.102129

      Pancheng Xie, Xiaowen Xie, Congrong Ye, Kevin M. Dean, Isara Laothamatas , S K Tahajjul T Taufique, Joseph Takahashi, Shin Yamazaki, Ying Xu, and Yi Liu (2024). Mammalian circadian clock proteins form dynamic interacting microbodies distinct from phase separation. Proc. Nat. Acad. Sci. USA. In press.

      We have updated the discussion on p. 15 accordingly:

      “Live cell imaging of fluorescently-tagged FRQ proteins is consistent with FRQ phase separation in N. crassa nuclei. FRQ is plainly not homogenously dispersed within nuclei, and the concentrated foci observed at specific positions in the nuclei indicate condensate behavior similar to that observed for other phase separating proteins (Bartholomai, et al., 2022; Caragliano, et al., 2022; Gonzalez, A., et al., 2021; Tatavosian, et al., 2019; Xiao, et al., 2021). While ongoing experiments are exploring more deeply the spatiotemporal dynamics of FRQ condensates in nuclei, the small size of fungal nuclei as well as their rapid movement with cytoplasmic bulk flow through the hyphal syncytium makes these experiments difficult. Of particular interest is drawing comparisons between FRQ and the Drosophila Period protein, which has been observed in similar foci that change in size and subnuclear localization throughout the circadian cycle (Meyer, et al., 2006; Xiao, et al., 2021), although it must be noted that the foci we observed are considerably more dynamic in size and shape than those reported for PER in Drosophila (Xiao, et al., 2021). A very recent manuscript (Xie, et al., 2024) calls into question the importance and very existence of LLPS of clock proteins at least in regards to mammalian cells, noting that it may be an artifact of overexpression in some instances where it is seen, and that at normal levels of expression there is no evidence for elevated levels at the nuclear periphery. Artifacts resulting from overexpression are unlikely to be a problem for our study and that of Xiao et al as in both cases clock proteins were tagged at their endogenous locus and expressed from their native promoters. Although we noted enrichment of FRQmNeonGreen near the nuclear envelope in our live-cell imaging, there remained abundant FRQ within the core of the nucleus.”

      Recommendations For The Authors:

      The data in Fig 6 showing microscopy of Neurospora is suggestive but needs more information/controls. Does the strain that expresses FRQ-mNeonGreen have normal circadian rhythms? How were the cultures handled (in terms of circadian entrainment etc.) for imaging? Do samples taken at different clock times appear different in terms of punctate structures in microscopy? The authors cite the Xiao 2021 paper in Drosophila, but would be good to see if the in vivo picture is fundamentally similar in Neurospora.

      All of the live-cell images we report were from cells grown in constant light; in the dark, strains bearing FRQ[NeonGreen] have normally robust rhythms with a slightly elongated period length as measured by a frq Cbox-luc reporter. Although we are interested, of course, in whether and if so how the punctate structures changed as function of circadian time, this is work in progress and beyond the scope of the present study. This said, it is plain to see from the movie included as a Supplemental file here that the puncta we see are moving and fusing/splitting on a scale of seconds whereas those reported in Drosophila by Xiao et al. (Xiao et al, 2021, above) were stable for many minutes; thus the FRQ foci seen in Neurospora are quite a bit more dynamic than those in Drosophila.

      We have updated the results section on p. 11 to provide this information more clearly: “FRQ thus tagged and driven by its own promoter is expressed at physiologically normal levels, and strains bearing FRQmNeonGreen as the only source of FRQ are robustly rhythmic with a slightly longer than normal period length. Live-cell imaging in Neurospora crassa offers atypical challenges because the mycelia grow as syncytia, with continuous rapid nuclei motion during the time of imaging. This constant movement of nuclei is compounded by the very low intranuclear abundance of FRQ and the small size of fungal nuclei, making not readily feasible visualization of intranuclear droplet fission/fusion cycles or intranuclear fluorescent photobleaching recovery experiments (FRAP) that could report on liquid-like properties. Nonetheless, bright and dynamic foci-like spots were observed well inside the nucleus and near the nuclear periphery, which is delineated by the cytoplasm-facing nucleoporin Son-1 tagged with mApple at its C-terminus (Fig. 6D,E, Movie S1). Such foci are characteristic of phase separated IDPs (Bartholomai, et al., 2022; Caragliano, et al., 2022; Gonzalez, A., et al., 2021; Tatavosian, et al., 2019) and share similar patterning to that seen for clock proteins in Drosophila (Meyer, et al., 2006; Xiao, et al., 2021), although the foci we observed are substantially more dynamic than those reported in Drosophila.”

      Another issue where some commentary would be helpful: Fig 7 shows that phase separation behavior is strongly temperature dependent (not biophysically surprising). Is that at odds with the known temperature compensation of the circadian rhythm if LLPS indeed plays a key role in the oscillator?

      We believe that the dependence of CK1-mediated FRQ phosphorylation on temperature, as manifested by FRQ phase separation, is consistent with temperature compensation within the Neurospora circadian oscillator. The phenomenon of temperature compensation by circadian clocks involves the intransigence of the oscillator period to temperature change. Stability of period with temperature change would not necessarily be expected of a generic chemical oscillator, which would run faster (shorter period) at higher temperature owing to Arrhenius behavior of the underlying chemical reactions. Circadian phosphorylation of FRQ is one such chemical process that contributes to the oscillation of FRQ abundance on which the clock is based. Reduced CK1 phosphorylation of FRQ causes both longer periods [Mehra et al., 2009] and loss of temperature compensation (manifested as a reduction of period length at higher temperature) [Liu et al, Nat Comm, 10, 4352 (2019); Hu et al, mBio, 12, e01425 (2021)]. Thus, the ability of increased LLPS formation at elevated temperature to reduce FRQ phosphorylation by CK1 (but not intrinsic CK1 autophosphorylation) would be a means to counter a decreasing period length that would otherwise manifest in an under compensated system. As further negative feedback on the system, LLPS is also promoted by FRQ phosphorylation itself, which in turn will reduce phosphorylation by CK1. Thus, both increased FRQ phosphorylation and temperature will couple to increased LLPS and mitigate period shortening through reduction of CK1 activity.

      Mehra et al., A Role for Casein Kinase 2 in the Mechanism Underlying Circadian Temperature Compensation. May 15, 2009. Cell 137, 749–760,

      Liu et al. FRQ-CK1 interaction determines the period of circadian rhythms in Neurospora. Nat Comm. 2019, 10 4352.

      Hu et al FRQ-CK1 Interaction Underlies Temperature Compensation of the Neurospora Circadian Clock mBio 2021 12 WOS:000693451600006.

      We have added Figure 8 to clarify the interpretation of the temperature compensation implicaitons of our work, the legend of which reads:

      “Figure 8: LLPS may play a role in temperature compensation of the clock through modulation of FRQ phosphorylation. Reduced CK1 phosphorylation of FRQ causes both longer periods (Mehra, et al., 2009) and loss of temperature compensation (manifested as a shortening of period at higher temperature) (Hu, et al., 2021; Liu, X., et al., 2019). Thus, the ability of increased LLPS at elevated temperature (larger grey circle) to reduce FRQ phosphorylation by CK1 will counter a shortening period that would otherwise manifest in an under compensated system. As further negative feedback, LLPS is also promoted by increased FRQ phosphorylation, which in turn will reduce phosphorylation by CK1. Thus, both increased FRQ phosphorylation and temperature favor LLPS and reduction of CK1 activity.”

      one minor comment: The chemical structures in Fig 3A have some issues where the "N" and "S" are flipped. Would be good to remake these figures to fix this problem.

      We apologize, the figure has been replaced with an improved version.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      The single-mutant and double-mutant crp/rpoB strains were made by co-transduction with a nearby gene deletion (kanR-marked). I couldn't tell from the methods section whether these mutants, e.g., crp-H22N delta-chiA, were compared to wild-type cells or deletion mutants, e.g., delta chiA, in the proteomics experiments. I encourage the authors to explain this more clearly in the methods section, and to briefly mention in the Results section and relevant figure legends that the crp/rpoB mutant strains (and possibly the "wild-type" strains) also have gene deletions. If the comparison "wild-type" strains are fully wild-type (i.e., not deleted for chiA/yjaH), it is especially important to mention this in the Results section and the figure legends since the phenotypic changes could be due to the gene deletions rather than the mutations in crp/rpoB

      We appreciate and agree with the editor's suggestion to clarify this point.

      Accordingly, we have made the following changes to the text:

      p11 L30-34 in the main text:

      "The second experiment similarly compared an engineered BW25113 (BW) strain, containing the two regulatory mutations from the compact set (i.e., crp H22N and rpoB A1245V) together with the deletions used to insert them (see methods and DataS1 file), to a “wild type” BW strain (a corresponding knockout strain without the mutations, see methods)."

      p28 under Chemostat proteomics experiment L13-16 in methods:

      "The starting volume of each bioreactor was 150 ml M9 media supplemented with either 30 mM and 10mM D-xylose for the evolved and ancestor samples or only 10mM D-xylose for BW including compact set mutations and/or the deletions used for their insertions (DataS1 file). The minimal media also included trace elements and vitamin B1 was omitted."

    2. Author Response

      The following is the authors’ response to the original reviews.

      Author response:

      Reviewer #1:

      The main objective of this study is to achieve the development of a synthetic autotroph using adaptive laboratory evolution. To accomplish this, the authors conducted chemostat cultivation of engineered E. coli strains under xylose-limiting conditions and identified autotrophic growth and the causative mutations. Additionally, the mutational mechanisms underlying these causative mutations were also explored with drill down assays. Overall, the authors demonstrated that only a small number of genetic changes were sufficient (i.e., 3) to construct an autotrophic E. coli when additional heterologous genes were added. While natural autotrophic microorganisms typically exhibit low genetic tractability, numerous studies have focused on constructing synthetic autotrophs using platform microorganisms such as E. coli. Consequently, this research will be of interest to synthetic biologists and systems biologists working on the development of synthetic autotrophic microorganisms. The conclusions of this paper are mostly well supported by appropriate experimental methods and logical reasoning. However, further experimental validation of the mutational mechanisms involving rpoB and crp would enhance readers' understanding and provide clearer insights, despite acknowledgement that these genes impact a broad set of additional genes. Additionally, a similar study, 10.1371/journal.pgen.1001186, where pgi was deleted from the E. coli genome and evolved to reveal an rpoB mutation is relevant to this work and should be placed in the context of the presented findings.

      We thank the reviewer for pointing this study out. It is very interesting that a mutation in a similar region in RpoB was observed in a related context of Pgi loss of activity. We have added a reference to this study in our text (Page 11, line 21).

      he authors addressed rpoB and crp as one unit and performed validation. They cultivated the mutant strain and wild type in a minimal xylose medium with or without formate, comparing their growth and NADH levels. The authors argued that the increased NADH level in the mutant strain might facilitate autotrophic growth. Although these phenotypes appear to be closely related, their relationship cannot be definitively concluded based on the findings presented in this paper alone. Therefore, one recommendation is to explore investigating transcriptomic changes induced by the rpoB and crp mutations. Otherwise, conducting experimental verification to determine whether the NADH level directly causes autotrophic growth would provide further support for the authors' claim.

      We appreciate the valuable comment and agree that the work was lacking such an analysis. Due to various reasons we have opted to use a proteomic approach which we feel fulfills the same purpose as the transcriptomics suggestion. We found interesting evidence in up-regulation of the fdoGH operon (comprising the native formate dehydrogenase O enzyme complex) which could indicate why there is an increase in NADH/NAD+ levels. We also hypothesize that this upregulation might be important more generally by drawing comparisons to natural chemo-autotrophs.

      Further experimental work (which we were not able to include in the current study) could help validate this link by deleting fdoGH and observing a loss of phenotype and, on the flip side, directly overexpressing the fdoGH operon and observing an increase in the NADH/NAD+ ratio. Indeed, if this overexpression were to prove sufficient for achieving an autotrophic phenotype without the mutations in the global transcription regulators, it would be a much more transparent design.

      We have added a section titled "Proteomic analysis reveals up-regulation of rPP cycle and formate-associated genes alongside down-regulation of catabolic genes" to the Results based on this analysis.

      • It would be beneficial to provide a more detailed explanation of the genetic background before the evolution stage, specifically regarding the ∆pfk and ∆zwf mutations. Furthermore, it is suggested to include a figure that provides a comprehensive depiction of the reductive pentose phosphate pathway and the bypass pathway. These will help readers grasp the concept of the "metabolic scaffold" as proposed by the authors.

      We agree with the reviewer that this could be helpful and we added a reference to the original paper Gleizer et al. 2019 that reported this design and also includes the relevant figure. We feel that the figure should not be added to the current manuscript as we continue to show that this design is not relevant in the context of the three reported mutations and such a figure could distract the attention of the reader from the main takeaways of the current study.

      • Despite the essentiality of the rpoB mutation (A1245V) to the autotrophic phenotype in the final strain, the inclusion of this mutation in step C1 does not appear to be justified. According to line 37 on page 3, the authors chose to retain the unintended mutation in rpoB based on its essentiality to the phenotype observed in other evolved strains. However, it should be noted that the mutations found in the evolved strain I, II, and III (P552T or D866E) were entirely different from the unintended mutation (A1245V) during genetic engineering. This aspect should be revised to avoid confusion among readers.

      Thank you for pointing this issue out, we added a clarification in the text (page 4 line 7) to avoid such confusion. We believe this point is much clearer now.

      The rpoB mutation which was shown to be essential in the study is indeed known to be common in ALE experiments in E. coli. Thus, I searched the different rpoB mutations in ALEdb in E. coli and I was able to find a similar mutation in a study where pgi was knocked out and then evolved. https://doi.org/10.1371/journal.pgen.1001186 This study seems very relevant given that pgi was a key mutation in the compact set of this work and the section "Modulation of a metabolic branch-point activity increased the concentration of rPP metabolites" informs that loss of function mutations in pgi were also found. The findings of this study should thus be put in the context of the previous related ALE study. I would recommend a similar analysis of crp mutations from studies in ALEdb to see if there are similar mutations in this gene as well or if this a unique mutation.

      We thank the reviewer for bringing this publication to our attention. We have addressed this observation in the main text (page 11 , line 21). We agree that it could have some connection to the pgi mutation yet we would not want to overspeculate about this role, as we also found the exact same mutation (A1245V) as an adaptation to higher temperature in another E. coli study (Tenaillon et al. 2012). We would like to bring forward the fact that the two reported rpoB mutations are always accompanied by another mutation with pleiotropic effects, either in the transcription factor Crp or in another RNA polymerase subunit (e.g RpoC). As such many epistatic effects could occur, one of which we also report here in page 13, line 18. In conclusion, although there could be a connection between the rpoB and pgi mutations, it could be a mere coincidence and the two mutations could exhibit two distinct roles in two distinct phenotypes.

      We also would like to thank the reviewer for suggesting a similar analysis for crp and found another mutation at a nearby residue with strong adaptive effects and mentioned it in our main text.

      Can the typical number of mutations found in a given ALE experiment be directly compared to those found in this study? It seems like a retrospective analysis of other ALE studies to show how many mutations typically occur in an ALE study and sets which were found to be causal to reproduce the phenotype of interest (through similar reverse engineering in the starting strain) should be presented. Again, the authors cite ALEdb which should provide direct numbers of mutations found in similar ALE studies with E. coli and one could then examine them to find sets of clearly causal mutations which recreate phenotypes of interest. Such an analysis would go a long way in supporting the main finding of "small number" of mutations.

      Discussion, page 12, line 42. "This could serve as a promising strategy for achieving minimally perturbed genotypes in future metabolic engineering attempts". There is an entire body of work around growth-coupled production which can be predicted and evolved with a genome-scale metabolic model and ALE. Thus, if this statement is going to be made, relevant studies should be cited and placed in context.

      The reviewer raises an important point which could indeed yield an interesting perspective. However, it would be difficult to perform this comparison in practice since many of the studies published on ALEdb have not isolated essential mutations from other mutation incidents nor have they determined the role of each mutation in the reported phenotypes. For example, many ALE trajectories include a hypermutator that greatly increases the number of irrelevant mutations and it is nearly impossible to sieve through them to find an essential set.

      Moreover, it is hard to compare the “level of difficulty” of achieving one phenotype over another and therefore feel that even though such an analysis would be insightful, it requires an amount of work which is outside the scope of this study.

      Finally, we would like to highlight our approach of using the iterative approach, isolating the relevant consensus mutations and repeating this process until no evolution process is required, we are not aware of prior studies that used this approach.

      We now clarified what we mean by "promising strategy" in the discussion in order to avoid any false claims about novelty (page 16 line 32): "Using metabolic growth-coupling as a temporary 'metabolic scaffold' that can be removed, could serve as a promising strategy for achieving minimally perturbed genotypes in future metabolic engineering attempts."

      Reviewer #2:

      Synthetic autotrophy of biotechnologically relevant microorganisms offers exciting chances for CO2 neutral or even CO2 negative production of goods. The authors' lab has recently published an engineered and evolved Escherichia coli strain that can grow on CO2 as its only carbon source. Lab evolution was necessary to achieve growth. Evolved strains displayed tens of mutations, of which likely not all are necessary for the desired phenotype.

      In the present paper the authors identify the mutations that are necessary and sufficient to enable autotrophic growth of engineered E. coli. Three mutations were identified, and their phenotypic role in enhancing growth via the introduced Calvin-Benson-Bassham cycle were characterized. It was demonstrated that these mutations allow autotrophic growth of E. coli with the introduced CBB cycle without any further metabolic intervention. Autotrophic growth is demonstrated by 13C labelling with 13C CO2, measured in proteinogenic amino acids. In Figures 2B and S1, the labeling data are shown, with an interval of the "predicted range under 13CO2".

      Here, the authors should describe how this interval was derived.

      The methodology is clearly described and appropriate.

      The present results will allow other labs to engineer E. coli and other microorganisms further to assimilate CO2 efficiently into biomass and metabolic products. The importance is evident in the opportunity to employ such strain in CO2 based biotech processes for the production of food and feed protein or chemicals, to reduce atmospheric CO2 levels and the consumption of fossil resources.

      Please describe in the methodology how the interval of the predicted range of 13C labeling was derived for Figures 2B and S1. Was it calculated by the dilution factor during 4 generations, or did you predict the label incorporation individually with a metabolic model?

      The text needs careful editing, some sentences are incomplete and there are frequent inconsistencies in writing metabolites and enzymes.

      P2L6: unclear sentence (incomplete?)

      P2L19: pastoris with lower case "p"

      P2L40: incomplete sentence

      P2L42: here, and at many other places, the writing of RuBisCO needs to be aligned. It is an abbreviation and should begin with a capital letter. Most commonly it is written as RuBisCO which I would suggest - please unify throughout the text.

      P3L3: formate dehydrogenase ... metabolites and enzymes with lower case letter. And, no hyphen here.

      P5L4: delete the : after unintentionally

      P6L16: carboxylation of RuBP (it is not CO2 that is carboxylated - if any, CO2 is carboxylating)

      P7L25: phosphoglucoisomerase (lower case)

      P8L5: in line

      P8L9: part of glycolysis/ ...

      P10L4: pentose phosphates (lower case, no hyphen).

      P10L4: all metabolites lower case

      P12L28: incomplete sentence

      P18L4: Escherichia coli in italics P18L15: Pseudomonas sp. in italics P18L16: ... promoter and with a strong ...

      P20, chapter Metabolomics: put the numbers of 12C and 13C in superscript P23L9: pentose phosphates ; all metabolites in lower case (as above) P23: all 12C and 13C with superscript numbers.

      Response to reviewer #2:

      We thank the reviewer for their comments, and for pointing out the need to clarify how we derived the predicted range of 13C labeling. We edited the text accordingly, and added the relevant calculation to the methods section (under the “13C Isotopic labeling experiment”). We would like to also thank the reviewer for the required text improvements, which were implemented. 

      Reviewer #3:

      The authors previously showed that expressing formate dehydrogenase, rubisco, carbonic anhydrase, and phosphoribulokinase in Escherichia coli, followed by experimental evolution, led to the generation of strains that can metabolise CO2. Using two rounds of experimental evolution, the authors identify mutations in three genes - pgi, rpoB, and crp - that allow cells to metabolise CO2 in their engineered strain background. The authors make a strong case that mutations in pgi are loss-of-function mutations that prevent metabolic efflux from the reductive pentose phosphate autocatalytic cycle. The authors also argue that mutations in crp and rpoB lead to an increase in the NADH/NAD+ ratio, which would increase the concentration of the electron donor for carbon fixation. While this may explain the role of the crp and rpoB mutations, there is good reason to think that the two mutations have independent effects, and that the change in NADH/NAD+ ratio may not be the major reason for their importance in the CO2-metabolising strain.

      We thank the reviewer for their comments and constructive feedback.

      We agree that there is probably a broader effect caused by the rpoB and crp mutations, besides the change in the NADH/NAD+ ratio. Hence, we performed a proteomics analysis, comparing the rpoB and crp mutations on a WT background to an autotrophic E.coli, searching for a mutual change in both strains compared to their "ancestors". We found up-regulation of rPP cycle and formate-associated genes, and a down-regulation of catabolic genes. We added a section dedicated to this matter under the title "Proteomic analysis reveals up-regulation of rPP cycle and formate-associated genes alongside down-regulation of catabolic genes".

      Specific comments:

      1. Deleting pgi rather than using a point mutation would allow the authors to more rigorously test whether loss-off-function mutants are being selected for in their experimental evolution pipeline. The same argument applies to crp.

      We appreciate this recommendation and indeed tried to delete pgi, but the genetic manipulation caused a knockout of other genes along with pgi (pepE, rluF, yjbD, lysC) so in the time available to us we cannot confidently determine whether the deletion alone is sufficient and can replace the mutation.

      Regarding crp, we do not think there is a reason to believe the mutation is a loss-of-function. In any case, the proteomics-based characterization of the crp mutation is now included in the SI.

      1. Page 10, lines 10-11, the authors state "Since Crp and RpoB are known to physically interact in the cell (26-28), we address them as one unit, as it is hard to decouple the effect of one from the other". CRP and RpoB are connected, but the authors' description of them is misleading. CRP activates transcription by interacting with RNA polymerase holoenzyme, of which the Beta subunit (encoded by rpoB) is a part. The specific interaction of CRP is with a different RNA polymerase subunit. The functions of CRP and RpoB, while both related to transcription, are otherwise very different. The mutations in crp and rpoB are unlikely to be directly functionally connected. Hence, they should be considered separately.

      Indeed, the fact that the proteins are interacting in the cell does not necessarily mean that the mutations are functionally connected. We therefore added as further justification in the new section:

      "As far as we know, the mutations in the Crp and RpoB genes affect the binding of the RNA polymerase complex to DNA and/or its transcription rates. Depending on the transcribed gene target, the effect of the two mutations might be additive, antagonistic, or synergistic. Since each one of these mutations individually (in combination with the pgi mutation) is not sufficient to achieve autotrophic growth, it is reasonable to assume that only the target genes whose levels of expression change significantly in the double-mutant are the ones relevant for the autotrophic phenotype”.

      In our proteomics analysis we considered each mutation separately. We found that in some cases the two mutations together have an additive effect, but in other cases we found that the two mutations together affect differently on the proteome, compared to the effect of each mutation alone. Since both mutations are essential to the phenotype, we decided to go with the approach of addressing the two mutations as one unit for the physiological and metabolic experiments.

      1. A Beta-galactosidase assay would provide a very simple test of CRP H22N activity. There are also simple in vivo and in vitro assays for transcription activation (two different modes of activation) and DNA-binding. H22 is not near the DNA-binding domain, but may impact overall protein structure.

      The mutation is located in “Activating Region 2”, interacting with RNA polymerase. We tried an in-vivo assay to determine the CRP H22N activity and got inconclusive results, we believe the proteomics analysis serves as a good method for understanding the global effect of the mutation.

      1. There are many high-resolution structures of both CRP and RpoB (in the context of RNA polymerase). The authors should compare the position of the sites of mutation of these proteins to known functional regions, assuming H22N is not a loss-of-function mutation in crp.

      We added a supplementary figure regarding the structural location of the two mutations, where it is demonstrated that crp H22N is located in a region interacting with the RNA polymerase and rpoB A1245V is located in proximity to regions interacting with the DNA.

      1. RNA-seq would provide a simple assay for the effects of the crp and rpoB mutations. While the precise effect of the rpoB mutation on RNA polymerase function may be hard to discern, the overall impact on gene expression would likely be informative.

      Indeed we agree that an omics approach to infer the global effect of these mutations is beneficial, we opted to use a proteomics approach and think it serves the purpose of clarifying the final, down-stream, effect on the cell.

      1. Page 2, lines 40-45, the authors should more clearly explain that the deletion of pfkA, pfkB and zwf was part of the experimental evolution strategy in their earlier work (Gleizer et al., 2019), and not a new strategy in the current study.

      We thank you for pointing this out, and edited the text accordingly.

      1. Page 3, line 27. Why did the authors compare the newly acquired mutants to only two mutants from the earlier work, not all 6?

      The 6 clones that were isolated in Gleizer et al., had 2 distinct mutation profiles. During the isolation process the lineage split into two groups. Three out of the 6 clones (clones 1,2,6) came from the same ancestor, and the other three (clones 3,4,5) came from another ancestor. Hence, these two groups shared almost all of their mutations (see Venn diagram). We decided to use for our comparison the representative with the highest number of mutations from each group (clones 5 and 6).

      Author response image 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Su et al propose the existence of two mechanisms repressing SBF activity during entry into meiosis in budding yeast. First, a decrease in Swi4 protein levels by a LUTI-dependent mechanism where Ime1 would act closing a negative feedback loop. Second, the sustained presence of Whi5 would contribute to maintaining SBF inhibited under sporulation conditions. The article is clearly written and the experimental approaches used are adequate to the aims of this work. The results obtained are in line with the conclusions reached by the authors but, in my view, they could also be explained by the existing literature and, hence, would not represent a major advance in the field of meiosis regulation.

      We respectfully disagree with the reviewer about their comment that this work can be explained by the existing literature. First, while SWI4LUTI has been previously identified in meiotic cells along with ~ 380 LUTIs, the biological purpose of these alternative mRNA isoforms and their effect on cellular physiology still remain largely unknown. Our manuscript clarifies this gap in understanding for SWI4LUTI. Loss of SWI4LUTI contributes to dysregulation of meiotic entry and does so by failing to properly repress the known inhibitors of meiotic entry, the CLNs. Furthermore, even though Cln1 and Cln2 have been previously shown to antagonize meiosis, the mechanisms that restrict their activity was unclear prior to our study.

      We recognize work done by others demonstrating Whi5-dependent repression of SBF during mitotic G1/S transition (De Bruin et al., 2004; Costanzo et al., 2004). We further examined Whi5’s involvement during meiotic entry and found that it acts in conjunction with the LUTI-based mechanism to restrict SBF activity. Combined loss of both mechanisms results in the increased expression of G1 cyclins, decreased expression of early meiotic genes, and a delay in meiotic entry (Figure 6). Neither mechanism was previously known to regulate meiotic entry. Our study not only adds to our broader understanding of gene regulation during meiosis but also raises additional questions regarding how LUTIs regulate gene expression and function.

      Regarding the first mechanism, Fig 1 shows that Swi4 decreases very little after 1-2h in sporulation medium, whereas G1-cyclin expression is strongly repressed very rapidly under these conditions (panel D and work by others). This fact dampens the functional relevance of Swi4 downregulation as a causal agent of G1 cyclin repression.

      Reviewer 1 expresses concern for the observation that by 2 h in sporulation media there is a 32% decrease in Swi4-3V5 protein abundance compared to 0 h in SPO. This is consistent with the range of protein level decrease typically accomplished by LUTI-based gene regulation (Chen et al., 2017; Chia et al., 2017; Tresenrider et al., 2021), and while it is a modest reduction, it is consistent across replicates. Furthermore, we don’t make the argument that reduction in Swi4 levels alone is the sole regulator of G1 cyclin levels. In fact, we report that in addition to Swi4 downregulation, Whi5 also functions to restrict SBF activity during meiotic entry, thereby ensuring G1 cyclin repression.

      In addition, the LUTI-deficient SWI4 mutant does not cause any noticeable relief in CLN2 repression, arguing against the relevance of this mechanism in the repression of G1-cyclin transcription during entry into meiosis. The authors propose a second mechanism where Whi5 would maintain SBF inactive under sporulation conditions. The role of Whi5 as a negative regulator of the SBF regulon is well known. On the other hand, the double WHI5-AA SWI4-dLUTI mutant does not upregulate CLN2, the G1 cyclin with the strongest negative effect on sporulation, raising serious doubts on the functional relevance of this backup mechanism during entry into meiosis.

      Due to replicate variance, CLN2 did not make the cut by our mRNA-seq data analysis as a significant hit. To address reviewer 1’s final point we opted for the “gold standard” of reverse transcription coupled with qPCR to measure CLN2 transcript levels in the double mutant ∆LUTI; WHI5-AA and the wild-type control. This revealed that CLN2 levels were significantly increased in the double mutant compared to wild type at 2 h in SPO (Author Response Image 1, *, p = 0.0288, two-tailed t-test).

      Author response image 1.

      Wild type (UB22199) and ∆LUTI;WHI5-AA (UB25428) cells were collected to perform RT-qPCR for CLN2 transcript abundance. Transcript abundance was quantified using primer sets specific for each respective gene from three technical replicates for each biological replicate. Quantification was performed in reference to PFY1 and then normalized to wild-type control. FC = fold change. Experiments were performed twice using biological replicates, mean value plotted with range. Differences in wild type versus ∆LUTI; WHI5-AA transcript levels compared with a two-tailed t-test (*, p = 0.0288)

      Reviewer #2 (Public Review):

      Summary:

      The manuscript highlights a mechanistic insight into meiotic initiation in budding yeast. In this study, the authors addressed a genetic link between mitotic cell cycle regulator SBF (the Swi4-Swi6 complex) and a meiosis inducing regulator Ime1 in the context of meiotic initiation. The authors' comprehensive analyses with cytology, imaging, RNA-seq using mutant strains lead the authors to conclude that Swi4 levels regulates Ime1-Ume6 interaction to activate expression of early meiosis genes for meiotic initiation. The major findings in this paper are that (1) the higher level of Swi4, a subunit of SBF transcription factor for mitotic cell cycle regulation, is the limiting factor for mitosis-to-meiosis transition; (2) G1 cyclins (Cln1, Cln2), that are expressed under SBF, inhibit Ime1-Ume6 interaction under overexpression of SWI4, which consequently leads to downregulation of early meiosis genes; (3) expression of SWI4 is regulated by LUTI-based transcription in the SWI4 locus that impedes expression of canonical SWI4 transcripts; (4) expression of SWI4 LUTI is likely negatively regulated by Ime1; (5) Action of Swi4 is negatively regulated by Whi5 (homologous to Rb)-mediated inhibition of SBF, which is required for meiotic initiation. Thus, the authors proposed that meiotic initiation is regulated under the balance of mitotic cell cycle regulator SBF and meiosis-specific transcription factor Ime1.

      Strengths:

      The most significant implication in their paper is that meiotic initiation is regulated under the balance of mitotic cell cycle regulator and meiosis-specific transcription factor. This finding will provide a mechanistic insight in initiation of meiosis not only into the budding yeast also into mammals. The manuscript is overall well written, logically presented and raises several insights into meiotic initiation in budding yeast. Therefore, the manuscript should be open for the field. I would like to raise the following concerns, though they are not mandatory to address. However, it would strengthen their claims if the authors could technically address and revise the manuscript by putting more comprehensive discussion.

      Weaknesses:

      The authors showed that increased expression of the SBF targets, and reciprocal decrease in expression of meiotic genes upon SWI4 overexpression at 2 h in SPO (Figure 2F). However, IME1 was not found as a DEG in Supplemental Table 1. Meanwhile, IME1 transcript level was decreased at 2 h SPO condition in pATG8-CLN2 cells in Fig S4C.

      Now this reviewer still wonders with confusion whether expression of IME1 transcripts per se is directly or in directly suppressed under SBF-activated gene expression program at 2 h SPO in pATG8-SWI4 and pATG8-CLN2 cells. This reviewer wonders how Fig S4C data reconciles with the model summarized in Fig 6F.

      One interpretation could be that persistent overexpression of G1 cyclin caused active mitotic cell cycle, and consequently delayed exit from mitotic cell cycle, which may have given rise to an apparent reduction of cell population that was expressing IME1. For readers to better understand, it would be better to explain comprehensively this issue in the main text.

      We believe there was an oversight here. In supplemental table 1, IME1 expression is reported as significantly decreased. The volcano plot shown below also highlights this change (Author response image 2).

      Author response image 2.

      Volcano plot of DE-Seq2 analysis for ∆LUTI;WHI5-AA versus wild type. Dashed line indicates padj (p value) = 0.05. Analysis was performed using mRNA-seq from two biological replicates. Wild type (UB22199) and ∆LUTI;WHI5-AA (UB25428) cells were collected at 2 h in SPO. SBF targets (pink) (Iyer et al., 2001) and early meiotic genes (blue) defined by (Brar et al., 2012). Darker pink or darker blue, labeled dots are well studied targets in either gene set list.

      The % of cells with nuclear Ime1 was much reduced in pATG8-CLN2 cells (Fig 2B) than in pATG8-SWI4 cells (Fig 4C). Is the Ime1 protein level comparable or different between pATG8-CLN2 strain and pATG8-SWI4 strain? Since it is difficult to compare the quantifications of Ime1 levels in Fig S1D and Fig S4B, it would be better to comparably show the Ime1 protein levels in pATG8-CLN2 and pATG8-SWI4 strains.

      Further, it is uncertain how pATG8-CLN2 cells mimics the phenotype of pATG8-SWI4 cells in terms of meiotic entry. It would be nice if the authors could show RNA-seq of pATG8-CLN2/WT and/or quantification of the % of cells that enter meiosis in pATG8-CLN2.

      Analyzing bulk Ime1 protein levels across a population of cells (Author response image 3) reveals that overexpression of CLN2 causes a more severe decrease in Ime1 levels than overexpression of SWI4. This is consistent with our observation that pATG8-CLN2 has a more severe impact on meiotic entry than pATG8-SWI4. The higher CLN2 levels (Author response image 4) likely accounts for the observed difference in severity of phenotype between the two mutants.

      Author response image 3.

      Samples from strain wild type (UB22199), pATG8-SWI4 (UB2226), pATG8-CLN2 (UB25959) and were collected between 0-4 hours (h) in sporulation medium (SPO) and immunoblots were performed using α-GFP. Hxk2 was used a loading control.

      Author response image 4.

      Wild type (UB22199), pATG8-SWI4 (UB2226), pATG8-CLN2 (UB25959) cells were collected to perform RT-qPCR for CLN2 transcript abundance. Quantification was performed in reference to PFY1 and then normalized to wild-type control. FC = fold change.

      The authors stated that reduced Ime1-Ume6 interaction is a primary cause of meiotic entry defect by CLN2 overexpression (Line 320-322, Fig 4J-L). This data is convincing. However, the authors also showed that GFP-Ime1 protein level was decreased compared to WT in pATG8-CLN2 cells by WB (Fig S4A).

      Compared to wild type, pATG8-CLN2 cells have lower levels of Ime1. Consequently, reviewer 2 suggests that this reduction may be responsible for the observed meiotic defect. However, we tested this possibility and found it not to be the primary cause of the meiotic defect in pATG8-CLN2 cells. As shown in Figure S4A, when IME1 was overexpressed from the pCUP1 promoter, Ime1 protein levels were similar between wild-type and pATG8-CLN2 cells. Despite this similarity, we still observed a decrease in nuclear Ime1 (Figure 4F) and no rescue in sporulation (Figure 4A). Therefore, the reduction in Ime1 protein levels alone cannot explain the meiotic defect caused by CLN2 overexpression.

      Further, GFP-Ime1 signals were overall undetectable through nuclei and cytosol in pATG8-CLN2 cells (Fig 4B), and accordingly cells with nuclear Ime1 were reduced (Fig 4C). Although the authors raised a possibility that the meiotic entry defect in the pATG8-CLN2 mutant arises from downregulation of IME1 expression (Line 282-283), causal relationship between meiotic entry defect and CLN2 overexpression is still not clear.

      As reviewer 2 comments, we initially considered the possibility that meiotic entry defect induced by CLN2 overexpression could be attributed to decreased IME1 expression. However, in the following paragraph in the manuscript, we demonstrate equalizing IME1 transcript levels using the pCUP1-IME1 allele does not rescue the meiotic defect caused by CLN2 overexpression. Consequently, we conclude that the decrease in IME1 transcript levels alone cannot explain the meiotic defect caused by increased CLN2 levels.

      Is the Ime1 protein level reduced in the pATG8-CLN2;UME6-⍺GFP strain compared to WT? It would be better to comparably show the Ime1 protein levels in the pATG8-CLN2 strain and the pATG8-CLN2;UME6-⍺GFP strain by WB. Also, it would be nice if the authors could show quantification of the % of cells that enter meiosis in the pATG8-CLN2;UME6-⍺GFP strain to see how and whether artificial tethering of Ime1 to Ume6 rescued normal meiosis program rather than simply showing % sporulation in Fig4A.

      We do not agree with the suggestion to compare the pATG8-CLN2;UME6-⍺GFP with wild type as the kinetics of meiosis is rather different. The more appropriate comparison is UME6-⍺GFP and pATG8-CLN2;UME6-⍺GFP which shows GFP-Ime1 bulk protein levels are slightly lower (Author response image 5). However, when we use a more sensitive measurement of meiotic entry through the nuclear accumulation of Ime1 in single cells, as illustrated in Figure 4L, it becomes evident that the Ume6-Ime1 tether is capable of restoring nuclear Ime1 levels, even in the presence of CLN2 overexpression. Given that these cells exhibited wild type levels of nuclear Ime1 and underwent sporulation after 24 hours, we make the fair assumption that they have successfully initiated the meiotic program.

      Author response image 5.

      Wild type (UB22199), pATG8-SWI4 (UB35106), UME6-⍺GFP (UB35300), and UME6-⍺GFP; pATG8-CLN2 (UB35177) cells collected between 0-3 hours (h) in sporulation medium (SPO) and immunoblots were performed using α-GFP. Hxk2 was used a loading control

      The authors showed Ume6 binding at the SWI4LUTI promoter (Figure 5K). However, since Ume6 forms a repressive form with Rpd3 and Sin3a and binds to target genes independently of Ime1, Ume6 binding at the SWI4LUTI promoter bind does not necessarily represent Ime1-Ume6 binding there. Instead, it would be better to show Ime1 ChIP-seq at the SWI4LUTI promoter.

      We agree with reviewer 2 that Ime1 ChIP would be the ideal measurement. Unfortunately, this has proved to be technically challenging. To address this limitation, we utilized a published Ume6 ChIP-seq dataset along with a published UME6-T99N RNA-seq dataset. Cells carrying the UME6-T99N allele are unable to induce the expression of early meiotic transcripts due to lack of Ime1 binding to Ume6 (Bowdish et al., 1995). Accordingly, RNA-seq analysis should reveal whether or not the LUTIs identified by Ume6 ChIP are indeed regulated by Ime1-Ume6 during meiosis. For SWI4LUTI, this is exactly what we observe. Not only is there Ume6 binding at the SWI4LUTI promoter (Figure 5K), but there is also a significant decrease in SWI4LUTI expression in UME6-T99N cells under meiotic conditions (Figure S5). Based on these data, we conclude that the Ime1-Ume6 complex is responsible for regulating SWI4LUTI expression during meiosis.

      The authors showed ∆LUTI mutant and WHI5-AA mutant did not significantly change the expression of SBF targets nor early meiotic genes relative to wildtype (Figure 6A, C). Accordingly, they concluded that LUTI- or Whi5-based repression of SBF alone was not sufficient to cause a delay in meiotic entry (Line451-452), and perturbation of both pathways led to a significant delay in meiotic entry (Figure 6E). This reviewer wonders whether Ime1 expression level and nuclear localization of Ime1 was normal in ∆LUTI mutant and WHI5-AA mutant.

      Based on our observations in Figure 4, Ime1 protein and expression levels were not reliable indicators of meiotic entry. Consequently, we opted for a more downstream and functionally relevant measure of meiotic entry, which involved time-lapse fluorescence imaging of Rec8, an Ime1 target.

      Reviewer #1 (Recommendations For The Authors):

      The authors would like to mention previous work showing that G1-cyclin overexpression decreases the expression and nuclear accumulation of Ime1 (Colomina et al 1999 EMBO J 18:320). In this work, the interaction between Ime1 and Ume6 had been found to be resistant to G1-cyclin expression, arguing against a direct effect on the recruitment of Ime1 at meiotic promoters. Alternatively, differences in the experimental approaches used could be discussed to explain this apparent discrepancy.

      To clarify, in the paper that reviewer 1 is referring to (Colomina et al., 1999), the authors determine that the interaction between Ime1 and Ume6 is regulated by the presence of a non-fermentable carbon source. Additional work by others reveals that Ime1 undergoes phosphorylation by the protein kinases Rim11 and Rim15, promoting its nuclear localization and enabling interaction with Ume6 (Vidan and Mitchell, 1997; Pnueli et al., 2004; Malathi et al., 1999, 1997). Furthermore, both Rim11 and Rim15 kinase activities are inhibited by the presence of glucose via the PKA pathway (Pedruzzi et al., 2003; Rubin-Bejerano et al., 2004; Vidan and Mitchell, 1997). Accordingly, the elimination of cyclins in the presence of a non-fermentable carbon source (glucose) in (Colomina et al., 1999) is unlikely to result in an interaction between Ime1 and Ume6, as Rim11 and Rim15 remain repressed. Removal of cyclins in acetate does not further increase Ime1-Ume6 interaction leading the authors to conclude that G1 cyclins do not block Ime1 function through its interaction with Ume6. This work however uses loss of function (removal of G1 cyclins) to study the G1 cyclins’ effect on Ime1-Ume6 interaction while using timepoints that are well beyond meiotic entry. Additionally, Ime1-Ume6 interaction is being tested using yeast-two hybrid analysis with just the proposed interaction domain of Ime1 (amino acids 270-360). Therefore, the interpretation that G1 cyclins are dispensable for regulating the interaction between Ime1 and Ume6 is unclear from this work alone.

      There are many differences that can explain the discrepancy between our work and (Colomina et al., 1999). Our work uses increased expression of cyclins during meiotic entry. Additionally, in our study, we collected timepoints to measure meiotic entry (2 h in SPO) and sporulation (gamete formation) efficiency (24 h in SPO). Finally, we are using the endogenous, full length Ime1. These differences could very well explain the discrepancy with previous work. Lastly, in our discussion we acknowledge the lack of CDK consensus phosphorylation sites on Ime1. Therefore, it is most likely that G1 cyclins are not directly phosphorylating Ime1 and that other factors like Rim11 and Rim15 could be direct targets of the G1 cyclins, considering their involvement in the phosphorylation of Ime1-Ume6, as well as their role in regulating Ime1 localization and its interaction with Ume6. We have included these points in the revised manuscript (lines 547-551).

      Reviewer #2 (Recommendations For The Authors):

      This reviewer thinks that the findings in this paper are of general interest to meiosis field and help understanding the mechanism of meiotic initiation in mammals. The way of the current manuscript seems to be written for limited budding yeast scientists, and should not limited to the interest by the budding yeast scientists. Thus, it would be better to discuss more about what is known about the mechanism of initiation of meiosis not only in budding yeast but also in other species to share their finding to more broad scientists using other organisms.

      We appreciate reviewer 2’s comment and have added more discussion about the parallels between yeast and mammalian systems in meiotic initiation (lines 613-624).

      Reviewer #3 (Recommendations For The Authors):

      The effect of overexpression of Swi4 is tested for MI and MII (Fig1F): this is a very indirect readout of meiotic entry. The authors could present Rec8 localization (Fig2I) at this stage. However, this is still a superficial description of the meiotic phenotype: is the phenotype only a delay or is the meiotic prophase altered. It is specifically important to analyse this in more detail to answer whether the overexpression of Swi4 leads to an identical phenotype to the one of CLN2. Also the comparison between overexpression of Swi4 and Cln2 is difficult to evaluate: what is the level of CLN2 when SwI4 is overexpressed compared to CLN2 overexpression. The percentage of nuclear Ime1 is 50% vs 5% when Swi4 or Cln2 are overexpressed. What is the interpretation? What are the levels of Ime1? (Y axis of quantifications not comparable, see also comment for Fig5F,H)

      CLN2 is expressed at a much higher level in pATG8-CLN2 cells relative to pATG8-SWI4 (Author Response Image 4). Therefore, we don’t expect identical phenotypes, but rather a more severe deficiency in meiotic entry upon CLN2 overexpression. The key experiment that establishes causality between SWI4 and CLNs is reported in Figure 3, where deletion of either CLN1 or CLN2 rescues the meiotic entry delay exerted by SWI4 overexpression.

      Fig3EF: What is the phenotype of Cln1 and Cln2 without overexpression of Swi4?

      Meiotic entry is not faster in cln1∆ or cln2∆ cells compared to wild-type. We included these data in Supplemental Figure 3 and made the relevant changes in the manuscript (lines 257-261).

      Fig4F: Need a control with CLN2 overexpression only.

      A control with only CLN2 overexpression (pATG8-CLN2) is not appropriate since these meiotic time course experiments are synchronized using the pCUP1-IME1 allele. It would be a misleading comparison since the two meiosis would have different kinetics. Figure 4F reports that despite similar IME1 transcript levels and Ime1 protein levels, CLN2 overexpressing cells still have reduced nuclear Ime1. Since side-by-side comparison of pATG8-CLN2 and pCUP1-IME1 is not possible, we chose to measure sporulation efficiency at 24 h in Figure 4A. These data together suggest that elevated IME1 transcript and protein levels cannot rescue the defects associated with increased CLN2 expression.

      Fig5E: in wild type, by Northern blot, Swi4canon level is increasing during meiosis, not decreasing?, whereas protein level is decreasing, what is the interpretation?

      Northern data is less quantitative than smFISH, which show that SWI4canon transcript levels are significantly lower in meiosis compared to vegetative cells (Figure 5D). We also note that the Northern blot data were acquired from unsynchronized meiotic cells and could have additional limitations based on the population-based nature of the assay. Finally, additional analysis of a transcript leader sequencing (TL-seq) dataset from synchronized cells (Tresenrider et al., 2021) further confirms the decrease in SWI4canon transcript levels upon meiotic entry. (Author response image 6).

      Author response image 6.

      TL-seq data from (Tresenrider et al. 2021) visualized on IGV at the SWI4 locus. Two timepoints are plotted including premeiotic before IME1 induction (pink) and meiotic prophase or after IME1 induction (blue).

      Fig5F, H. This quantification needs duplicates for validation.

      Replicates are submitted for every blot in this paper to eLIFE.It can be found in the shared Dropbox folder to the editors (named Raw-blots-for-eLIFE).

      Fig5F, H. Why are the wild type values so different?

      The immunoblotting done between Figure 5F and Figure 5H are on separate blots and therefore should not be compared. Additionally, these values are not absolute measurements of wild type values of Swi4-3V5 and therefore we should not expect them to be the same. Any comparisons done of relative amounts of Swi4-3V5 are always done on the same blot and normalized to a loading control, hexokinase.

      FigS5: What is the effect of the Ume6-T99N on Swi4 protein level and on meiotic entry? Is the backup mechanism proposed active?

      We haven’t measured Swi4 protein levels in the UME6-T99N background but given that this mutation is known to disrupt the interaction between Ime1 and Ume6, we expect a similar trend to that reported in Figure 5I (pCUP1-IME1 uninduced).

      What is the evidence that Swi4/6 is a E2F homolog? What is the homology at the protein level?

      While there is no sequence homology between SBF and E2F there is remarkable similarity between metazoans and yeast in terms of the regulation of the G1/S transition (reviewed in Bertoli et al., 2013). E2F and SBF are both repressed before the G1/S transition by the inhibitors Rb and Whi5, respectfully (Costanzo et al., 2004; De Bruin et al., 2004; Hasan et al., 2014). During G1/S transition, a cyclin dependent kinase phosphorylates and inactivates these inhibitors. We have carefully edited our language in the manuscript to “functional homology” instead of just “homology”.

      FigS3 is missing

      Each supplemental figure was matched to its corresponding main figure. In the original submission, we didn’t have Figure S3. However, the revised manuscript now contains FigS3.

      Bertoli, C., J.M. Skotheim, and R.A.M. De Bruin. 2013. Control of cell cycle transcription during G1 and S phases. Nat. Rev. Mol. Cell Biol. 14:518–528. doi:10.1038/nrm3629.

      Bowdish, K.S., H.E. Yuan, and A.P. Mitchell. 1995. Positive control of yeast meiotic genes by the negative regulator UME6. Mol. Cell. Biol. 15:2955–2961. doi:10.1128/mcb.15.6.2955.

      Brar, G.A., M. Yassour, N. Friedman, A. Regev, N.T. Ingolia, and J.S. Weissman. 2012. High-Resolution View of the Yeast Meiotic Program Revealed by Ribosome Profiling. Science (80-. ). 335:552–558. doi:10.1126/science.1215110.

      De Bruin, R.A.M., W.H. McDonald, T.I. Kalashnikova, J. Yates, and C. Wittenberg. 2004. Cln3 activates G1-specific transcription via phosphorylation of the SBF bound repressor Whi5. Cell. 117:887–898. doi:10.1016/j.cell.2004.05.025.

      Chen, J., A. Tresenrider, M. Chia, D.T. McSwiggen, G. Spedale, V. Jorgensen, H. Liao, F.J. Van Werven, and E. Ünal. 2017. Kinetochore inactivation by expression of a repressive mRNA. Elife. 6:1–31. doi:10.7554/eLife.27417.

      Chia, M., A. Tresenrider, J. Chen, G. Spedale, V. Jorgensen, E. Ünal, and F.J. van Werven. 2017. Transcription of a 5’ extended mRNA isoform directs dynamic chromatin changes and interference of a downstream promoter. Elife. 6:1–23. doi:10.7554/eLife.27420.

      Colomina, N., E. Garí, C. Gallego, E. Herrero, and M. Aldea. 1999. G1cyclins block the Ime1 pathway to make mitosis and meiosis incompatible in budding yeast. EMBO J. 18:320–329. doi:10.1093/emboj/18.2.320.

      Costanzo, M., J.L. Nishikawa, X. Tang, J.S. Millman, O. Schub, K. Breitkreuz, D. Dewar, I. Rupes, B. Andrews, and M. Tyers. 2004. CDK activity antagonizes Whi5, an inhibitor of G1/S transcription in yeast. Cell. 117:899–913. doi:10.1016/j.cell.2004.05.024.

      Hasan, M., S. Brocca, E. Sacco, M. Spinelli, P. Elena, L. Matteo, A. Lilia, and M. Vanoni. 2014. A comparative study of Whi5 and retinoblastoma proteins : from sequence and structure analysis to intracellular networks. 4:1–24. doi:10.3389/fphys.2013.00315.

      Iyer, V.R., C.E. Horak, P.O. Brown, D. Botstein, V.R. Iyer, M. Snyder, and C.S. Scafe. 2001. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature. 409:533–538. doi:10.1038/35054095.

      Malathi, K., Y. Xiao, and A.P. Mitchell. 1997. Interaction of yeast repressor-activator protein Ume6p with glycogen synthase kinase 3 homolog Rim11p. Mol. Cell. Biol. 17:7230–7236. doi:10.1128/mcb.17.12.7230.

      Malathi, K., Y. Xiao, and A.P. Mitchell. 1999. Catalytic roles of yeast GSK3β/shaggy homolog Rim11p in meiotic activation. Genetics. 153:1145–1152. doi:10.1093/genetics/153.3.1145.

      Pedruzzi, I., F. Dubouloz, E. Cameroni, V. Wanke, J. Roosen, J. Winderickx, and C. De Virgilio. 2003. TOR and PKA Signaling Pathways Converge on the Protein Kinase Rim15 to Control Entry into G0. Mol. Cell. 12:1607–1613. doi:10.1016/S1097-2765(03)00485-4.

      Pnueli, L., I. Edry, M. Cohen, and Y. Kassir. 2004. Glucose and Nitrogen Regulate the Switch from Histone Deacetylation to Acetylation for Expression of Early Meiosis-Specific Genes in Budding Yeast. Mol. Cell. Biol. 24:5197–5208. doi:10.1128/mcb.24.12.5197-5208.2004.

      Rubin-Bejerano, I., S. Sagee, O. Friedman, L. Pnueli, and Y. Kassir. 2004. The In Vivo Activity of Ime1, the Key Transcriptional Activator of Meiosis-Specific Genes in Saccharomyces cerevisiae, Is Inhibited by the Cyclic AMP/Protein Kinase A Signal Pathway through the Glycogen Synthase Kinase 3- Homolog Rim11. Mol. Cell. Biol. 24:6967–6979. doi:10.1128/mcb.24.16.6967-6979.2004.

      Tresenrider, A., K. Morse, V. Jorgensen, M. Chia, H. Liao, F.J. van Werven, and E. Ünal. 2021. Integrated genomic analysis reveals key features of long undecoded transcript isoform-based gene repression. Mol. Cell. 81:2231-2245.e11. doi:10.1016/j.molcel.2021.03.013.

      Vidan, S., and A.P. Mitchell. 1997. Stimulation of yeast meiotic gene expression by the glucose-repressible protein kinase Rim15p. Mol. Cell. Biol. 17:2688–2697. doi:10.1128/mcb.17.5.2688.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Sender et al describe a model to estimate what fraction of DNA becomes cell-free DNA in plasma. This is of great interest to the community, as the amount of DNA from a certain tissue (for example, a tumor) that becomes available for detection in the blood has important implications for disease detection.

      However, the authors' methods do not consider important variables related to cell-free DNA shedding and storage, and their results may thus be inaccurate. At this stage of the paper, the methods section lacks important detail. Thus, it is difficult to fully assess the manuscript and its results.

      Strengths:

      The question asked by the authors has potentially important implications for disease diagnosis. Understanding how genomic DNA degrades in the human circulation can guide towards ways to enrich for DNA of interest or may lead to unexpected methods of conserving cell-free DNA. Thus, the question "how much genomic DNA becomes cfDNA" is of great interest to the scientific and medical community. Once the weaknesses of the manuscript are addressed, I believe this manuscript has the potential to be a widely used resource.

      Weaknesses:

      There are two major weaknesses in how the analysis is presented. First, the methods lack detail. Second, the analysis does not consider key variables in their model.

      Issues pertaining to the methods section.

      The current manuscript builds a flux model, mostly taking values and results from three previous studies: 1) The amount of cellular turnover by cell type, taken from Sender & Milo, 2021

      2) The fractions of various tissues that contribute DNA to the plasma, taken from Moss et al, 2018 and Loyfer et al, 2023

      My expertise lies in cell-free DNA, and so I will limit my comments to the manuscripts in (2). Paper by Loyfer et al (additional context):

      Loyfer et al is a recent landmark paper that presents a computational method for deconvoluting tissues of origin based on methylation profiles of flow-sorted cell types. Thus, the manuscript provides a well-curated methylation dataset of sorted cell-types. The majority of this manuscript describes the methylation patterns and features of the reference methylomes (bulk, sorted cell types), with a smaller portion devoted to cell-free DNA tissue of origin deconvolution.

      I believe the data the authors are retrieving from the Loyfer study are from the 23 healthy plasma cfDNA methylomes analyzed in the study, and not the re-analysis of the 52 COVID-19 samples from Cheng et al (MED 2021).

      Paper by Moss et al (additional context):

      Moss et al is another landmark paper that predates the Loyfer et al manuscript. The technology used in this study (methylation arrays) is outdated but is an incredible resource for the community. This paper evaluates cfDNA tissues of origin in health and different disease scenarios. Again, I assume the current manuscript only pulled data from healthy patients, although I cannot be sure as it is not described in the methods section.

      This manuscript:

      The current manuscript takes (I think) the total cfDNA concentration from males and females from the Moss et al manuscript (pooled cfDNA; 2 young male groups, 2 old male groups, 2 young female groups, 2 old female groups, Supplementary Dataset; "total_cfDNA_conc" tab). I believe this is the data used as total cfDNA concentration. It would be beneficial for all readers if the authors clarified this point.

      The tissues of origin, in the supplemental dataset ("fraction" tab), presents the data from 8 cell types (erythrocytes, monocytes/macrophages, megakaryocytes, granulocytes, hepatocytes, endothelial cells, lymphocytes, other). The fractions in the spreadsheet do not match the Loyfer or Moss manuscripts for healthy individuals. Thus, I do not know what values the supplementary dataset represents. I also don't know what the deconvolution values are used for the flux model.

      The integration of these two methods lack detail. Are the authors here using yields (ie, cfDNA concentrations) from Moss et al, and tissue fractions from Loyfer et al? If so, why? There are more samples in the Loyfer manuscript, so why are the samples from Moss et al. being used? The authors are also selectively ignoring cell-types that are present in healthy individuals (Neurons from Moss et al, 2018). Why?

      Appraisal:

      At this stage of the manuscript, I think additional evidence and analysis is required to confirm the results in the manuscript.

      Impact:

      Once the authors present additional analysis to substantiate their results, this manuscript will be highly impactful on the community. The field of liquid biopsies (non-invasive diagnostics) has the potential to revolutionize the medical field (and has already in certain areas, such as prenatal diagnostics). Yet, there is a lack of basic science questions in the field. This manuscript is an important step forward in asking more "basic science" questions that seek to answer a fundamental biological question.

      We thank the reviewer for the valuable comments on our analysis. In response to the feedback, we have updated the analysis to address all critical points as described below and revised the text to enhance the clarity of our methodology. One notable improvement to our analysis involved ensuring better alignment between the cohort data for cfDNA plasma concentration and cell turnover estimates. To achieve this, we utilized the total plasma concentration of cfDNA from a study conducted by Meddeb et al. 2019, taking into account the influence of age and sex on these concentrations and specifically focusing on a cohort of relatively young and healthy individuals. Additionally, we considered expected variations related to sex, age, and other pertinent factors, as outlined in the studies by Meddeb et al. 2019 and Madsen et al. 2019.

      In addition, we have addressed concerns regarding the technical aspects of cfDNA analysis, providing detailed explanations of their limited impact on our analysis and the resulting conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Cell-free DNA (cfDNA) are short DNA fragments released into the circulation when cells die. Plasma cfDNA level is thought to reflect the degree of cell-death or tissue injury. Indeed, plasma cfDNA is a reliable diagnostic biomarker for multiple diseases, providing insights into disease severity and outcomes. In this manuscript, Dr. Sender and colleagues address a fundamental question: What fraction of DNA released from cell death is detectable as plasma cfDNA? The authors use public data to estimate the amount of DNA produced from dying cells. They also utilize public data to estimate plasma cfDNA levels. Their calculations showed that <10% of DNA released is detectable as plasma cfDNA, the fraction of detectable cfDNA varying by tissue sources. The study demonstrates new and fundamental principles that could improve disease diagnosis and treatment via cfDNA.

      Strengths:

      1) The experimental approach is resource-mindful taking advantage of publicly available data to estimate the fraction of detectable cfDNA in physiological states. The authors did not assess if the fraction of detectable cfDNA changes in disease conditions. Nonetheless, their pioneering study lays the foundation and provides the methods needed for a similar assessment in disease states.

      2) The findings of this study potentially explain discrepancies in measured versus expected tissue-specific cfDNA from some tissues. For example, the gastrointestinal tract is subject to high cell turnover and release of DNA. Yet, only a small fraction of that DNA ends up in plasma as gastrointestinal cfDNA.

      3) The study proposes potential mechanisms that could account for the low fraction of detectable cfDNA in plasma relative to DNA released. This includes intracellular or tissue machinery that could "chew up" DNA released from dying cells, allowing only a small fraction to escape into plasma as cfDNA. Could this explain why the gastrointestinal track with an elaborate phagosome machinery contributes a small fraction of plasma cfDNA? Given the role of cfDNA as damage-associated molecular pattern in some diseases, targeting such a machinery may provide novel therapeutic opportunities.

      Weaknesses:

      In vitro and in vivo studies are needed to validate these findings and define tissue machinery that contribute to cfDNA production. The validation studies should address the following limitations of the study design: -

      1) Align the cohorts to estimate DNA production and plasma cfDNA levels. Cellular turnover rate and plasma cfDNA levels vary with age, sex, circadian clock, and other factors (Madsen AT et al, EBioMedicine, 2019). This study estimated DNA production using data abstracted from a homogenous group of healthy control males (Sender & Milo, Nat Med 2021). On the other hand, plasma cfDNA levels were obtained from datasets of more diverse cohort of healthy males and females with a wide range of ages (Loyfer et al. Nature, 2023 and Moss et al., Nat Commun, 2018).

      2) "cfDNA fragments are not created equal". Recent studies demonstrate that cfDNA composition vary with disease state. For example, cfDNA GC content, fraction of short fragments, and composition of some genomic elements increase in heart transplant rejection compared to no-rejection state (Agbor-Enoh, Circulation, 2021). The genomic location and disease state may therefore be important factors to consider in these analyses.

      3) Alternative sources of DNA production should be considered. Aside from cell death, DNA can be released from cells via active secretion. This and other additional sources of DNA should be considered in future studies. The distinct characteristics of mitochondrial DNA to genomic DNA should also be considered.

      We appreciate the reviewer's comments on our analysis. In response to the feedback, we have updated to address key points and revised the text accordingly.

      1) We have incorporated several enhancements to improve the coherence of our analysis. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      There was no specific estimate for a cohort of young males in both Meddeb et al. and Loyfer et al.; however, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in literature (Meddeb et al. 2019; Madsen et al. 2019). Thus, we demonstrate that sex and age have a small effect on the cfDNA concentrations and thus are unlikely to alter our conclusions substantially when considering a healthy population. We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion.

      2) In this study, we addressed the total amount of cfDNA in healthy individuals without regard to GC content, representation of different genomic regions, or fragment length, as the goal was to understand if cell death rates are fully accounted for by cfDNA concentration. We agree that it will be interesting to study the relative representation of the genome in cfDNA and the processes that determine cfDNA concentration in pathologies beyond the rate of cell death. These topics for future research fall beyond this study's scope.

      3) We know only a few specific cases whereby DNA is released from cells that are not dying. These include the release of DNA from erythroblasts and megakaryocytes to generate anucleated erythrocytes and platelets (Moss et al. 2022, cited in our paper) and the release of NETs from neutrophils.

      The presence of cfDNA fragments originating from megakaryocytes and erythroblasts indicates the elimination of megakaryocytes and erythroblasts and the birth of erythrocytes and platelets. However, the considerations in the rest of the paper still apply: the concentration of cfDNA from these sources is far lower than expected from the cell turnover rate.

      Concerning NETosis: the presence of cfDNA originating in neutrophils that have not died would reduce the concentration of cfDNA from dying neutrophils and thus further increase the discrepancy, which is the topic of our study (under-representation of DNA from dying cells in plasma).

      We neglected mitochondrial DNA, as it is not measured in methylation cell-of-origin analysis. Similarly to the argument above, if some of the total DNA measured in plasma is in fact, mitochondrial, this would mean that genomic cfDNA concentration is actually lower than the estimates, meaning that an even smaller fraction of DNA from dying cells is measured in plasma.

      Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      I think readers would appreciate the authors commenting or addressing the following points, in addition to addressing the concerns I raised about the methods section in the public review:

      What variables and considerations did the authors omit in this study?

      1) Cell-free DNA is found in virtually every biofluid.

      Thus, the fact that cell-free DNA is not present in the plasma does not mean it cannot be detected elsewhere. This also implies that phagocytosis may not be the only factor related to cfDNA not being present in the blood. One example (of many, many others) is neutrophil-derived cell-free DNA, which is present in the urine.

      Indeed, dying cells and their DNA can be consumed locally, released into the blood, or shed outside the body. The latter is a function of tissue topology. For example, intestinal epithelial cell turnover releases material to the lumen of the gut (i.e., stool); kidney and bladder cell turnover releases material to urine; and lung epithelium releases material to the air spaces. In these cases, the absence of cfDNA in plasma is expected. However, in cases where tissue topology dictates release to blood, low representation in cfDNA indicates local consumption or a related mechanism. In Figure 1 of the manuscript, we distinguish between tissues according to their topology, labeling organs that shed material to the outside denoted by open circles.

      Neutrophil-derived DNA in urine likely represents a local process in the kidney (neutrophils that penetrate the epithelium and fall into the urine). Neutrophils that die elsewhere in the body must release cfDNA to the blood before it can reach the urine. Hence, quantifying plasma cfDNA is a legitimate approach for assessing the relationship between cell death and cfDNA. The revised text clarifies this point. We made revisions to the initial paragraph in the results section and a paragraph within the discussion to provide clarity on this topic:

      “Based on atlases of human cell type-specific methylation signatures, Moss et al. and Loyfer et al. analyzed the main cell types contributing to plasma cfDNA. They found the primary sources of plasma cfDNA to be blood cells: granulocytes, megakaryocytes, macrophages, and/or monocytes (the signature could not differentiate between the last two), lymphocytes, and erythrocyte progenitors. Other cells that had detectable contributions are endothelial cells and hepatocytes. Qualitatively, these cells represent most of the leading cell types in cellular turnover, as shown in Sender & Milo 2021 (Sender and Milo 2021). Epithelial cells of the gastrointestinal tract, lung, kidney, bladder, and skin are other cell types that significantly contribute to cellular turnover. Dying cells in these tissues are shed into the gut lumen, the air spaces, the urine, or out of the skin (note that while DNA from gut, lung, and kidney epithelial cells can be found in stool, bronchoalveolar lavage, and urine, the fate of DNA from skin cells is not known). This arrangement may explain why DNA from these cell types is not represented in plasma cfDNA in healthy conditions. Therefore, it appears that cells with high cfDNA plasma levels are those with relatively high turnover that are not being shed out of the body.”

      “A comparison between the different types of cells shows a trend in which less DNA flux from cells with higher turnover gets to the bloodstream. In particular, a tiny fraction (1 in 3x104) of DNA from erythroid progenitors arrives at the plasma, indicating an extreme efficiency of the DNA recovery mechanism. Erythroid progenitors are arranged in erythroblastic islands. Up to a few tens of erythroid progenitors surround a single macrophage that collects the nuclei extruded during the erythrocyte maturation process (pyrenocytes) (Chasis and Mohandas 2008). The amount of DNA discarded through the maturation of over 200 billion erythrocytes per day (Sender and Milo 2021) exceeds all other sources of homeostatic discarded DNA. Our findings indicate that the organization of dedicated erythroblastic islands functions highly efficiently regarding DNA utilization. Neutrophils are another high-turnover cell type with a low level of cfDNA. When contemplating the process of NETosis (Vorobjeva and Chernyak 2020), the existence of cfDNA originating from live neutrophils would potentially diminish the concentration of cfDNA released by dying neutrophils, thereby amplifying the observed ratio for this particular cell type. The overall trend of higher turnover resulting in a lower cfDNA to DNA flux ratio may indicate similar design principles, in which the utilization of DNA is better in tissues with higher turnover. However, our analysis is limited to only several cell types (due to cfDNA test and deconvolution sensitivities), and extrapolation to cells with lower cell turnover is problematic.”

      2) Effect of biofluid storage.

      Cell-free DNA continues to degrade after it is extracted via blood draw. This is not expected to change tissue of origin predictions (although that remains to be shown in the literature), but definitely affects extraction yield. This is not accounted for (or even discussed) in the manuscript. It would be important to understand how this was done for the data presented here.

      The paper integrates data from multiple recent studies that adhered to state-of-the-art procedures requiring rapid processing of blood samples. In fact, earlier studies that were not careful to isolate plasma quickly typically reported very high concentrations due to the lysis of leukocytes and artifactual release of genomic DNA. Rapid plasma isolation and DNA extraction typically yield 5ng/ml in healthy donors, as stated in the paper (last paragraph of Results).

      3) Batch effects

      Batch effects are not discussed here and can affect cfDNA yields.

      Our analysis relies on data reported by multiple studies from different groups, which independently results in similar key findings (total concentration of cfDNA and the relative contribution of different tissues). Thus, batch effects are unlikely to affect the calculations markedly.

      4) Cell-free DNA extraction kits

      Different kits and methods extract cell-free DNA at different quantities. Importantly, much research has been done recently that most kits are not sensitive for ultrashort cell-free DNA (of lengths ~50bp). This may represent most of the DNA present in plasma. This raises an important question: are the yields that are being used in Moss et al (where I presume the total concentration is taken from) accurate? Is there more cell-free DNA that was missed? While the importance of this ultrashort cfDNA has yet to be shown, it is in the blood. Thus, the authors' model may underestimate ratios by not accounting for this. This is mentioned in the discussion, but it is not evident why it was not added into the model.

      The Qiagen cfDNA extraction kit can detect 50bp fragments. As shown in the specification sheets of the kit (https://www.qiagen.com/us/products/diagnostics-and-clinical-research/solutions-for -laboratory-developed-tests/qiasymphony-dsp-circulating-dna-kit), urine DNA contains abundant DNA fragments that peak at 50bp. In contrast, plasma cfDNA does not contain such fragments at appreciable concentrations. This suggests that small fragments, 50-150bp long, are not a major component of cfDNA, and thus, our measurements of the total concentration of cfDNA are not dramatically underestimated.

      The convention regarding the size distribution of cfDNA fragments is based on extensive evidence using multiple approaches. For example, a study that profiled the DNA released by multiple cell lines in vitro (Aucamp et al. 2017) used another kit for DNA isolation – the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel, Düren, Germany). This kit does extract fragments that are 50bp long (nucleospin-gel-and-pcr-clean-up-mini). Indeed, the DNA released from cultured cells did contain a peak at 50bp, but it was minor compared with the nucleosome-size peak.

      More recently, several studies did suggest the presence of ultra-short cfDNA fragments, 50 bp long on average, and concluded that such fragments might be present at a molar concentration that is comparable to that of nucleosome-protected DNA (for example, (Hisano et al. 2021)).

      Thus, our model estimates can be off by up to 2-fold (that is, actual cfDNA concentration measured in most studies overlooks the small fragments and thus underestimates the actual concentration of cfDNA by 2-fold). This is incorporated into the revised manuscript.

      We note that we cannot exclude the presence of abundant ultra-short DNA fragments (e.g., 10bp long). However, such fragments are not measurable in cfDNA analysis. Thus, we can refine our conclusion and state that only a small fraction of DNA of dying cells appears as measured cfDNA. We included a section in the methods detailing the integration of a potential factor for the short fragments and revised the discussion:

      “The overall plasma cfDNA concentration was multiplied by a factor of 1.5 to accommodate for the presence of small fragments of approximately 50 base pairs of cfDNA in the plasma. These fragments are suggested to contribute comparable molar concentrations (Hisano, Ito, and Miura 2021). Despite having approximately one-third of the mass, it is reasonable to presume that these fragments represent a similar number of genomes. This assumption is based on the idea that their source is a broken nucleosome unit, and the fragments represent the portion that was not degraded. Given the restricted data and its interpretation, we consider factors spanning the range of 1 (negligible effect) and 2 (doubling of the amount). The chosen factor, 1.5, is selected as the midpoint within this range of uncertainty.”

      “In this study, we report a surprising, dramatic discrepancy between the measured levels of cfDNA in the plasma and the potential DNA flux from dying cells. One hypothetical explanation for that discrepancy is the limited sensitivity of typical cfDNA assays to short DNA fragments, which may contribute a significant fraction of the overall cfDNA mass. Regular cfDNA analysis shows a size distribution concentrated around a length of 165 base pairs (bp). The sizes in ctDNA vary more, but most are longer than 100 bp (Alcaide et al. 2020; Udomruk et al. 2021). Recent studies suggested a significant fraction of single-strand ultrashort fragments (length of 25-60 bp) (Cheng et al. 2022; Hisano, Ito, and Miura 2021). However, the total amount of DNA contained in these fragments is less than or comparable to that of the longer “regular” nucleosome-protected cfDNA fragments (Cheng et al. 2022; Hisano, Ito, and Miura 2021), arguing against ultrashort fragments as a dominant explanation for the “missing” cfDNA material. We integrated the estimate provided by Hisano et al. into our analysis as a modifying factor for both the total concentration and uncertainty of plasma cfDNA. Importantly, this incorporation did not alter the overall conclusions, as the discrepancy between the cfDNA plasma concentration and potential DNA flux remains on the same order of magnitude. We note that we cannot exclude the presence of abundant DNA fragments that are even shorter (e.g., 10bp long) and are not measurable in cfDNA analysis. Thus, our formal conclusion is that only a small fraction of the DNA of dying cells appears as measurable cfDNA.”

      5) Health status of samples analyzed.

      Health, sex and physical activity affects cfDNA yields. This is not accounted for or discussed in the manuscript.

      We incorporated several enhancements to improve our analysis in response to the provided feedback. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      Furthermore, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in the works of (Meddeb et al. 2019; Madsen et al. 2019). Our intent in doing so was to demonstrate that these factors are unlikely to alter our conclusions substantially when considering a healthy population. We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion:

      “Our estimates for total plasma cfDNA concentration were derived from the median concentration observed in individuals below 47 years of age (n=52), as reported by (Meddeb et al. 2019). To complement this, we integrated our total concentration estimates with data on the proportion of cfDNA originating from specific cell types, leveraging a plasma methylome deconvolution method described by (Loyfer et al. 2023), which did not provide absolute quantities of cfDNA). To quantify the uncertainty associated with our cfDNA concentration estimates, we employed a methodology that considered several sources of variation. First, we incorporated the confidence interval of the median concentration reported by Meddeb et al. as a measure of uncertainty. Additionally, we accounted for individual-specific and analytic variations based on the study by (Madsen et al. 2019), encompassing factors such as the precise timing of measurements and assay precision. These sources of uncertainty were combined using the approach outlined below.”

      “Our current analysis focused on estimating plasma cfDNA concentration and cellular turnover in a cohort of healthy, relatively young individuals. The total plasma cfDNA concentrations were sourced from healthy individuals below 47 years, as reported by (Meddeb et al. 2019). We use data analyzed based on plasma samples from healthy individuals to estimate the proportion of cfDNA originating from specific cell types (Loyfer et al. 2023). These values were then compared to the potential DNA flux resulting from homeostatic cellular turnover, estimated for reference healthy males aged between 20 and 30 (Sender and Milo 2021). In our analysis, we considered various sources of uncertainty, including inter-individual variation, variability in the timing of sample collection, and analytical precision (Madsen et al. 2019; Meddeb et al. 2019). These factors collectively contributed to an uncertainty factor of less than 3. Importantly, this level of uncertainty does not alter our conclusion regarding the relatively small fraction of DNA present in plasma as cfDNA. Furthermore, we acknowledge that age and sex can impact total cfDNA concentration, as demonstrated by (Meddeb et al. 2019), with potential variations of up to 30%. However, as the results of our analysis present a much larger difference, these effects do not change the conclusions drawn from our analysis. Nevertheless, age and health status may influence the proportion of cfDNA originating from specific cell types and their corresponding cellular turnover rates. Consequently, the ratios themselves may vary in the elderly population or individuals with underlying health conditions.”

      Reviewer #2 (Recommendations For The Authors):

      1) Align the cohorts to estimate DNA production and plasma cfDNA levels. Cellular turnover rate and plasma cfDNA levels vary with age, sex, circadian clock, and other factors (Madsen AT et al, EBioMedicine, 2019). This study estimated DNA production using data abstracted from a homogenous group of healthy control males (Sender & Milo, Nat Med 2021). On the other hand, plasma cfDNA levels were obtained from datasets of more diverse cohort of healthy males and females with a wide range of ages (Loyfer et al. Nature, 2023 and Moss et al., Nat Commun, 2018).

      We have incorporated several enhancements to improve the coherence of our analysis. In our revised examination, we drew upon the total plasma concentration of cfDNA, as documented in a study conducted by (Meddeb et al. 2019), while considering the influence of age and sex on these concentrations. To ensure the cohort's alignment, we focus on relatively young and healthy individuals, specifically those below the age of 47. This approach allowed for a more meaningful comparison with the estimated DNA flux from a reference male human aged between 20 and 30 years.

      There was no specific estimate for a cohort of young males in both Meddeb et al. and Loyfer et al.; however, we factored in the expected variations stemming from sex, age, and other relevant factors, as elucidated in literature (Meddeb et al. 2019; Madsen et al. 2019). Thus, we demonstrate that sex and age have a small effect on the cfDNA concentrations and thus are unlikely to alter our conclusions substantially when considering a healthy population.

      We summarize the changes in the first paragraph, replacing the “Tissue-specific cfDNA concentration” subsection of the method, and the fourth paragraph added to the discussion.

      “Our estimates for total plasma cfDNA concentration were derived from the median concentration observed in individuals below 47 years of age (n=52), as reported by (Meddeb et al. 2019). To complement this, we integrated our total concentration estimates with data on the proportion of cfDNA originating from specific cell types, leveraging a plasma methylome deconvolution method described by (Loyfer et al. 2023), which did not provide absolute quantities of cfDNA). To quantify the uncertainty associated with our cfDNA concentration estimates, we employed a methodology that considered several sources of variation. First, we incorporated the confidence interval of the median concentration reported by Meddeb et al. as a measure of uncertainty. Additionally, we accounted for individual-specific and analytic variations based on the study by (Madsen et al. 2019), encompassing factors such as the precise timing of measurements and assay precision. These sources of uncertainty were combined using the approach outlined below.”

      “Our current analysis focused on estimating plasma cfDNA concentration and cellular turnover in a cohort of healthy, relatively young individuals. The total plasma cfDNA concentrations were sourced from healthy individuals below 47 years, as reported by (Meddeb et al. 2019). We use data analyzed based on plasma samples from healthy individuals to estimate the proportion of cfDNA originating from specific cell types (Loyfer et al. 2023). These values were then compared to the potential DNA flux resulting from homeostatic cellular turnover, estimated for reference healthy males aged between 20 and 30 (Sender and Milo 2021). In our analysis, we considered various sources of uncertainty, including inter-individual variation, variability in the timing of sample collection, and analytical precision (Madsen et al. 2019; Meddeb et al. 2019). These factors collectively contributed to an uncertainty factor of less than 3. Importantly, this level of uncertainty does not alter our conclusion regarding the relatively small fraction of DNA present in plasma as cfDNA. Furthermore, we acknowledge that age and sex can impact total cfDNA concentration, as demonstrated by (Meddeb et al. 2019), with potential variations of up to 30%. However, as the results of our analysis present a much larger difference, these effects do not change the conclusions drawn from our analysis. Nevertheless, age and health status may influence the proportion of cfDNA originating from specific cell types and their corresponding cellular turnover rates. Consequently, the ratios themselves may vary in the elderly population or individuals with underlying health conditions.”

      2) "cfDNA fragments are not created equal". Recent studies demonstrate that cfDNA composition vary with disease state. For example, cfDNA GC content, fraction of short fragments, and composition of some genomic elements increase in heart transplant rejection compared to no-rejection state (Agbor-Enoh, Circulation, 2021). The genomic location and disease state may therefore be important factors to consider in these analyses.

      In this study, we addressed the total amount of cfDNA in healthy individuals without regard to GC content, representation of different genomic regions, or fragment length, as the goal was to understand if cell death rates are fully accounted for by cfDNA concentration. We agree that it will be interesting to study the relative representation of the genome in cfDNA and the processes that determine cfDNA concentration in pathologies beyond the rate of cell death. These topics for future research fall beyond this study's scope.

      3) Alternative sources of DNA production should be considered. Aside from cell death, DNA can be released from cells via active secretion. This and other additional sources of DNA should be considered in future studies. The distinct characteristics of mitochondrial DNA to genomic DNA should also be considered.

      We know only a few specific cases whereby DNA is released from cells that are not dying. These include the release of DNA from erythroblasts and megakaryocytes to generate anucleated erythrocytes and platelets (Moss et al. 2022, cited in our paper) and the release of NETs from neutrophils.

      The presence of cfDNA fragments originating from megakaryocytes and erythroblasts indicates the elimination of megakaryocytes and erythroblasts and the birth of erythrocytes and platelets. However, the considerations in the rest of the paper still apply: the concentration of cfDNA from these sources is far lower than expected from the cell turnover rate.

      Concerning NETosis: the presence of cfDNA originating in neutrophils that have not died would reduce the concentration of cfDNA from dying neutrophils and thus further increase the discrepancy, which is the topic of our study (under-representation of DNA from dying cells in plasma).

      We updated a paragraph in the discussion regarding this issue:

      “A comparison between the different types of cells shows a trend in which less DNA flux from cells with higher turnover gets to the bloodstream. In particular, a tiny fraction (1 in 3x104) of DNA from erythroid progenitors arrives at the plasma, indicating an extreme efficiency of the DNA recovery mechanism. Erythroid progenitors are arranged in erythroblastic islands. Up to a few tens of erythroid progenitors surround a single macrophage that collects the nuclei extruded during the erythrocyte maturation process (pyrenocytes) (Chasis and Mohandas 2008). The amount of DNA discarded through the maturation of over 200 billion erythrocytes per day (Sender and Milo 2021) exceeds all other sources of homeostatic discarded DNA. Our findings indicate that the organization of dedicated erythroblastic islands functions highly efficiently regarding DNA utilization. Neutrophils are another high-turnover cell type with a low level of cfDNA. When contemplating the process of NETosis (Vorobjeva and Chernyak 2020), the existence of cfDNA originating from live neutrophils would potentially diminish the concentration of cfDNA released by dying neutrophils, thereby amplifying the observed ratio for this particular cell type. The overall trend of higher turnover resulting in a lower cfDNA to DNA flux ratio may indicate similar design principles, in which the utilization of DNA is better in tissues with higher turnover. However, our analysis is limited to only several cell types (due to cfDNA test and deconvolution sensitivities), and extrapolation to cells with lower cell turnover is problematic.”

      We neglected mitochondrial DNA, as it is not measured in methylation cell-of-origin analysis. Similarly to the argument above, if some of the total DNA measured in plasma is in fact mitochondrial, this would mean that genomic cfDNA concentration is actually lower than the estimates, meaning that an even smaller fraction of DNA from dying cells is measured in plasma.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We would firstly like to thank all reviewers for their comments and support of this manuscript.

      Reviewer #1 (Recommendations For The Authors):

      No further recommendations.

      Reviewer #2 (Recommendations For The Authors):

      All of my comments have been sufficiently addressed.

      Reviewer #3 (Recommendations For The Authors):

      Thanks for responding to my former recommendations constructively. I believe these points have been fully addressed in this new version.

      However, I have not seen any comments on the points I raised in my former public review concerning the I-2 dependence of the FonSIX4 cell death. Do you know whether FonSIX4 would trigger cell death in tissues not expressing any I-2?

      We are a little confused concerning this comment. I-2 is a different class of resistance protein (NLR) that recognises Avr2 and this is likely to be intracellular. From the previous public review, we believe reviewer 3 may have been asking us to clarify the dependence of I (MM or M82) on FonSIX4 cell death. We have performed these controls by expressing FonSIX4 and associated FonSIX4/Avr1 chimeras in N. benthamiana (with the PR-1 signal peptide for efficient secretion of effectors) and it does not cause cell death in the absence of the I receptor – see S11F Fig. This was not explicitly conveyed in text so we have included the following in text: “Using the N. benthamiana assay we show FonSIX4 is recognised by I receptors from both cultivars (IM82 and iMoneymaker) and cell death is dependent on the presence of IM82 or iMoneymaker (Fig 5B, S11 Fig).”

      I still recommend discussing whether the Avr1 residues crucial for Avr activity are in the same structural regions of the C-terminal domain where previous work has identified residues under diversifying selection in symbiotic fungal FOLD proteins.

      The region important for recognition does encompass some residues within the structural region identified to be under diversifying selection in FOLD effectors from Rhizophagus irregularis previously reported (two residues within one beta-strand). However, we also see residues that don’t overlap to this area. We also note that the mycFOLD proteins analysed in symbiotic fungi are heavily skewed towards strong structurally similarity with FolSIX6 (similar cysteine spacing within both N and C-domains and structural orientation of the N and C-domains) rather than Avr1. We are under the impression that Avr1 was not included in the analysis of diversifying selection in symbiotic fungal FOLD proteins, it also is unclear to us if close Avr1 homologues are present. With this in mind, and considering our already lengthy discussion (as previously highlighted during reviewer), we have decided not to include further discussion concerning this point.


      The following is the authors’ response to the original reviews.

      We would like to thank the editor(s) and reviewers for their work concerning our manuscript. Most of the suggested changes were related to text changes which we have incorporated into the revised version. Please find our response to reviewers below.

      Reviewer #1 (Recommendations For The Authors):

      I only have very minor suggestions for the authors. The first one comes from reading the manuscript and finding it very dense with so many acronyms. This will limit the audience that will read the study and appreciate its impact. This is more noticeable in the Results, with many passages that I would suggest moving to Methodology.

      We thank reviewer 1 for their very positive review. We understand that due to the nature of this study, which includes many protein alleles/mutations that were expressed with different boundaries etc., it is difficult to achieve this. Reviewer 2 asked for more details to be provided. We hope we have achieved a nice balance in the revised manuscript.

      Something else that would facilitate the reading of the manuscript is the effectors name. The authors use the SIX name or the Avr name for some effectors and it makes it difficult to follow up.

      We have tried to make this consistent for Avr1 (SIX4), Avr2 (SIX3) and Avr3 (SIX1). Other SIX effectors are not known Avrs so the SIX names were used.

      Reading the manuscript and seeing how in most of the sections the authors used a computational approach followed by an experimental approach, I wonder why Alphafold2-multimer was not used to investigate the interaction between the effector and the receptor?

      This is a great suggestion, we have certainly investigated this, however to date there is no experimental evidence to directly support the direct interaction between I and Avr1. Post review, we spent some time trying to capture an interaction using a co-immunoprecipitation approach however to date we have not been able to obtain robust data that support this. We are currently looking to study this utilising protein biophysics/biochemistry but this work will take some time.

      Reviewer #2 (Recommendations For The Authors):

      We thank reviewer 2 for the very thorough editing and recommendations. We have incorporated all minor text edits below into the manuscript.

      Line 43: perhaps "Effector recognition" instead of "Effector detection", to be consistent with line 51?

      Line 60: Change to "leads".

      Line 79: Italicise Avr2.

      Line 94: Add the acronym ETI in parentheses after "effector-triggered immunity".

      Line 106: "(Leptosphaeria Avirulence-Supressing)" should be "(Leptosphaeria Avirulence and Supressing)".

      Line 112: Change "defined" to "define".

      Line 119: Spell out the species name on first use.

      Line 205: Glomeromycota is a division rather than a genus. Consistent with Fig 2, it also does not need to italicized.

      Line 207: Change "basidiomycete" to "Division Basidiomycota", consistent with Fig 2.

      Line 214: Change "alignment of Avr1, Avr3, SIX6 and SIX13" to "alignment of the mature Avr1, Avr3, SIX6 and SIX13 sequences".

      Line 324: Change "solved structures" to "solved protein structures".

      Line 335: Spell out acronyms like "MS" on first use in figure legends. Also dpi in other figure legends.

      Line 341: replace "effector-triggered immunity (ETI)" with "(ETI)" - see comment on Line 94.

      Line 370: Change "domains" to "domain".

      Line 374: In the title, change "C-terminus" to C-domain", consistent with the rest of the figure legend.

      Line 404: Change "(basidiomycetes and ascomycetes)" to "(Basidiomycota and Ascomycota fungi)", consistent with Fig 2C.

      Line 416: Change "in" to "by".

      Line 427: un-italicize the parentheses.

      Line 519: First mention of NLR. Spell out the acronym on first use in main text. S5 and S11 figure titles should be bolded.

      Line 852: Replace "@" with "at".

      S4 Table: Gene names should be italicised.

      S5 Table: Needs to be indicated that the primer sequences are in the 5´-3´ orientation.

      With regards to the Agrobacterium tumefaciens-mediated transient expression assays involving co-expression of the Avr1 effector and I immune receptor, the authors need to make clear how many biological replicates were performed as this information is only provided for the ion leakage assay.

      We have added these data to the figure legend

      Line 57: For me, the text "Fol secretes a limited number of structurally related effectors" reads as Fol secretes structurally related effectors, but very few of them are structurally related. Perhaps it would be better to say that the effector repertoire of Fol is made up of proteins that adopt a limited number of structural folds, or that the effector repertoire can be classified into a reduced set of structural families?

      This edit has been incorporated.

      Lines 66-67: Subtle re-wording required for "The best-characterized pathosystem is F. oxysporum f. sp. lycopersici (Fol)", as a pathosystem is made up of a pathogen and its host. Perhaps "The best-characterized pathosystem involves F. oxysporum f. sp. lycopersici (Fol) and tomato".

      Sentence has been reworded.

      Line 113 and throughout: Stick with one of "resistance protein", "receptor", "immune receptor" and "immunity receptor" throughout the manuscript.

      We have decided to use both receptor and immunity receptor as not all receptors investigated in the manuscript provide immunity.

      Lines 149-150: The title does not fully represent what is shown in the figure. The text "that is unique among fungal effectors" can be deleted as there is nothing in Fig 1 that shows that the fold is unique to fungal effectors.

      Figure title has been changed.

      Line 173: The RMSD of Avr3 is stated as being 3.7 Å, but in S3 Fig it is stated as being 3.6 Å.

      This was a mistake in the main text and has been corrected.

      Lines 202-204: This sentence needs to be reworded, as the way that it is written implies that the Diversispora and Rhizophagus genera are in the Ascomycota division. Also, "Ascomycetes" should be changed to "Ascomycota fungi", consistent with Fig 2.

      Sentence has been reworded.

      Line 233: "Scores above 8". What type of scores? Z-scores?

      These are Z-scores. This has been added in text.

      Lines 242-246: It is stated that SIX9 and SIX11 share structural similarity to various RNA-binding proteins, but no scores used to make these assessments is given. The scores should be provided in the text.

      Z-scores have been added.

      Fig 4A: SIX3 should be Avr2, consistent with line 292. The gene names should be italicised in Fig 4A.

      SIX3 was changed to Avr2. Gene names have been italicised.

      Line 356: Subtle rewording required, as "co-infiltrated with both IM82 and iMoneymaker" implies that you infiltrated with protein rather than Agrobacterium strains.

      Sentence has been reworded.

      Fig 5A, Fig 5C and Line 380: Light blue is used, but this looks grey. Perhaps change colour, as grey is already used to show the pro-domain in Fig 5A (or simply change the colour used to highlight the pro-domain)?

      Colour depicting the C-domain was changed.

      Lines 530-531: This text is no longer correct. Rlm4 and Rlm3 are now known to be alleles of Rlm9. See: Haddadi, P., Larkan, N. J., Van deWouw, A., Zhang, Y., Neik, T. X., Beynon, E., ... & Borhan, M. H. (2022). Brassica napus genes Rlm4 and Rlm7, conferring resistance to Leptosphaeria maculans, are alleles of the Rlm9 wall‐associated kinase‐like resistance locus. Plant Biotechnology Journal, 20(7), 1229.

      We thank the reviewer for picking this up. This text has been updated.

      Line 553: Provide more information on what the PR1 signal peptide is.

      More information about the PR1 signal peptide has been added.

      Lines 767-781: Descriptions and naming conventions of proteins throughout the figure legend need to be consistent and better reflect their makeup. For example, I think it would be best to put the sequence range after each protein mentioned - e.g. Avr118-242 or Avr159-242 instead of Avr1, PSL1_C37S18-111 instead of PSL1_C37S, etc. Furthermore, it is often stated that a protein is full-length when it lacks a signal peptide - my thought is that if a proteins lack its signal peptide, it is not full-length. The acronym "PD" also needs to be spelled out as "pro-domain (PD)" in the figure legend.

      We have incorporated sequence range for proteins that were produced upon first use. Sequence ranges that were modelled in AlphaFold2 were not added in text because they can be found in Supplementary Table 3.

      Lines 853-845: It is stated the sizes of proteins are indicated above the chromatogram in S10 Fig, but this is not the case. It is also not clear from S10B Fig that the faint peaks correspond to the peaks in the Fig 4B chromatogram. In S10D Fig, the stick of C58S is difficult to see. Perhaps change the colour or use an arrow/asterisk?

      Protein size estimates have been added above the chromatogram. Added text to indicate that the faint peaks correspond to peaks in Fig 4B. Added an asterisk in S10D Fig to identify the location of C58.

      S14 Fig is not mentioned/referenced in the main text of the manuscript.

      This was a mistake and has been added.

      The reference list needs to be updated to accommodate those referenced bioRxiv preprints that have now been published in peer-reviewed journals.

      The reference list has been updated.

      Reviewer #3 (Recommendations For The Authors):

      It would be good to discuss whether the pro-domains affecting virulence or avirulence activity.

      Kex2, the protease that cleaves the pro-domain functions in the golgi. We therefore suspect that the pro-domain is removed prior to secretion. For recombinant protein production in E. coli we find that these pro-domains are necessary to obtain soluble protein (doi: 10.1111/nph.17516). As we require the pro-domain for protein production and can not completely removing them from our preps, we cannot perform experiments to test this and subsequently comment further. In a paper that identified SIX effectors in tomato utilising proteomics approach (https://bsppjournals.onlinelibrary.wiley.com/doi/10.1111/j.1364-3703.2007.00384.x), it appears that the pro-domains were not captured in this analysis. This supports the conclusion that they are not associated with the mature/secreted protein.

      The authors stated that the C-terminal domain of SIX6 has a single disulfide bond unique to SIX6. Please clarify in which context is it unique: in Fusarium or across all FOLD proteins?

      This is in direct comparison to Avr1 and Avr3. The disulfide in the C-domain of SIX6 is unique compared to Avr1 and Avr3. This has been made clear in text.

      The structural similarity of FOLD proteins to other known structures have been discussed (lines 460ff), but it is not clear whether all structures and models identified in this work would yield cysteine inhibitor and tumor necrosis factors as best structural matches in the database or whether this is specific to a single FOLD protein. Please consider discussing recently published findings by others (Teulet et al. 2023, New Phytologist) on this aspect.

      This analysis was performed for Avr1, we obtained relatively low similarity hits for Avr3/Six6. We have updated this text accordingly… “Unfortunately, the FOLD effectors share little overall structural similarity with known structures in the PDB outside of the similarity with each other. At a domain level, the N-domain of the FOLD effector Avr1 has some structural similarities with cystatin cysteine protease inhibitors (PDB code: 4N6V, PDB code: 5ZC1) [60, 61], and the C-domain with tumour necrosis factors (PDB code: 6X83) [62] and carbohydrate-binding lectins (PDB code: 2WQ4) [63]. Relatively weak hits were observed for Avr3/Six6.”

      It might be useful to clearly point out that the ToxA fold and the C-terminus of the FOLD fold are different.

      We have secondary structural topology maps of the FOLD and ToxA-like families in S8 Fig which highlight the differences in topology between these two families.

      Please add information to Fig.S8 listing the approach to generate the secondary structure topology maps.

      We have added this information in the figure caption.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors found that nifuroxazide has the potential to augment the efficacy of radiotherapy in HCC by reducing PD-L1 expression. This effect may be attributed to increased degradation of PD-L1 through the ubiquitination-proteasome pathway. The paper provides new ideas and insights to improve treatment effectiveness, however, there are additional points that could be addressed.

      • The paper highlights that the combination of nifuroxazide increases tumor cell apoptosis. A discussion regarding the potential crosstalk or regulatory mechanisms between apoptotic pathways and PD-L1 expression would be valuable.

      Response: Thank you very much for your suggestion. Research has shown that regulating the STAT3/PD-L1 pathway can effectively increase apoptosis in lung cancer cells (1). Our study confirmed that nifuroxazide can effectively inhibit the expression of p-STAT3 and PD-L1 in liver cancer cells, which may be the reason for the increased apoptosis of these cells. We have added relevant descriptions in the discussion.

      • The benefits and advantages of nifuroxazide combination could be compared to the current clinical treatment options.

      Response: Thank you greatly for your insightful feedback. The primary objective of this study is to explore whether nifuroxazide can effectively enhance the degradation of PD-L1, thereby increasing the radiosensitivity of HCC. Our research reveals that compared to radiation therapy alone, combination therapy involving nifuroxazide and radiation significantly inhibits tumor growth in mice and boosts the anti-tumor immune response. This finding could potentially provide a valuable strategy for patients who exhibit resistance to radiation therapy in clinical practice. Moreover, clinical trial investigations have demonstrated that nivolumab, a PD-1 monoclonal antibody, when combined with radiation therapy for HCC, exhibits promising safety and efficacy (2). This evidence supports the future application of nifuroxazide in the treatment of HCC. However, to reach this objective, we must continue to conduct extensive research, including comparing nifuroxazide with existing therapies in clinical practice. We believe that nifuroxazide not only significantly inhibits the expression of PD-L1 protein in HCC cells but also functions as a PD-L1 inhibitor. Furthermore, it effectively curbs the proliferation and migration of HCC cells, induces tumor cell apoptosis, and may exhibit enhanced anti-tumor effects, making it a promising candidate for clinical use. We have incorporated relevant discussion content in the article to address these points.

      Reviewer #2 (Public Review):

      Summary:

      Zhao et al. aimed to explore an important question - how to overcome the resistance of hepatocellular carcinoma cells to radiotherapy? Given that the immune-suppressive microenvironment is a major mechanism underlying resistance to radiotherapy, they reasoned that a drug that blocks the PD-1/PD-L1 pathway could improve the efficacy of radiation therapy and chose to investigate the effect of Nifuroxazide, an inhibitor of stat3 activation, on radiotherapy efficacy in treating hepatocellular carcinoma cells. From in vitro experiments, they find combination treatment (Nifuroxazide+ radiotherapy) increases apoptosis and reduces proliferation and migration, in comparison to radiotherapy alone. From in vivo experiments, they demonstrate that combined treatment reduces the size and weight of tumors in vivo and enhances mice survival. These data indicate a better efficacy of combination therapy compared to radiotherapy alone. Moreover, they also determined the effect of combination therapy on tumor microenvironment as well as peripheral immune response. They find that combination therapy increases infiltration of CD4+ and CD8+ cells as well as M1 macrophages in the tumor microenvironment. Interestingly, they find that the ratio of Treg cells in spleen is increased by radiotherapy but decreased by Nifuroxazide. Considering the immune-suppressive role of Treg cells, this finding is consistent with reduced tumor growth by combination therapy. However, it is unclear whether the combined therapy affects the ratio of Treg cells in the tumors or not. The most intriguing part of the study is the determination of the effect of Nifuroxazide on PD-L1 expression in the context of radiotherapy. Considering Nifuroxazide is a stat3 activation inhibitor and stat3 inhibition leads to reduced expression of PD-L1, one would expect Nifuroxazide decreases PD-L1 expression through stat3. However, they found that the effect of Nifuroxazide on PD-L1 is dependent on GSK3 mediated Proteasome pathways and independent of stat3, in the given experimental context. To determine the relevance to human hepatocellular carcinoma, they also measured the PD-L1 expression in human tumor tissues of HCC patients pre- and post-radiotherapy. The increased PD-L1 expression level in HCC after radiotherapy is impressive. However, it is unclear whether the patients being selected in the study had resistant disease to radiotherapy or not.

      Overall, the data are convincing and supportive to the conclusions.

      Strengths:

      1) Novel finding: Identified novel mechanism underlying the effect of Nifuroxazide on PD-L1 expression in hepatocellular carcinoma cells in the context of radiotherapy.

      2) Comprehensive experimental approaches: using different approaches to prove the same finding. For example, in Fig 4, both IHC and WB were used. In Fig 5, both IF and WB were used.

      3) Human disease relevance: Compared observations in mice with human tumor samples.

      The question in the summary, “However, it is unclear whether the combined therapy affects the ratio of Treg cells in the tumors or not”.

      Response: Thank you very much for your valuable feedback. We have included additional flow cytometry results regarding the expression of relevant Treg cells (CD4+CD25+Foxp3+ T lymphocytes) in tumor tissues (Supplementary Fig 2). Our findings indicate that the number of Treg cells in tumor tissues significantly decreased following combination therapy with nifuroxazide and radiotherapy.

      The question in the summary, “However, it is unclear whether the patients being selected in the study had resistant disease to radiotherapy or not”.

      Response: Thank you very much for your valuable feedback. All the HCC patients selected in this study experienced recurrence after radiation treatment.

      Weaknesses:

      1) It is hard to tell whether the observed phenotype and mechanism are generic or specific to the limited cell lines used in the study. The in vitro experiments were performed in one human cell line and the in vivo experiments were performed in one mouse cell line.

      Response: Thank you very much for your feedback. We have included additional experimental data from another human cell line Huh7 (Supplementary Fig 3).

      2) The study did not distinguish the effect of increased radiosensitivity by nifuroxazide from combined anti-tumor effects by two different treatments.

      Response: Thank you greatly for your insightful feedback. In this study, we primarily compared the antitumor effects of nifuroxazide combined with radiotherapy versus either nifuroxazide or radiotherapy alone, and confirmed that the combined treatment demonstrated a more potent anti-hepatocellular carcinoma effect compared to single therapy. Furthermore, to achieve the goal of utilizing nifuroxazide for the treatment of clinical hepatocellular carcinoma, additional research is necessary, including comparisons with other clinically established therapies. We have also incorporated relevant discussions in our analysis.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors embarked on an exploration of how nifuroxazide could enhance the responsiveness to radiotherapy by employing both an in vitro cell culture system and an in vivo mouse tumor model.

      Strengths:

      The researchers conducted an array of experiments aimed at revealing the function of nifuroxazide in aiding the radiotherapy-induced reduction of proliferation, migration, and invasion of HepG2 cells.

      Weaknesses:

      The authors did not provide the molecular mechanism through which nifuroxazide collaborates with radiotherapy to effectively curtail the proliferation, migration, and invasion of HCC cells. Moreover, the evidence supporting the assertion that nifuroxazide contributes to the degradation of radiotherapy-induced upregulation of PD-L1 via the ubiquitin-proteasome pathway appears to be insufficient. Importantly, further validation of this discovery should involve the utilization of an additional syngeneic mouse HCC tumor model or an orthotopic HCC tumor model.

      Response: Thank you very much for your insightful comments. Nifuroxazide has been demonstrated to inhibit the expression of p-STAT3, thereby suppressing tumor cell proliferation and migration (3, 4). In our study, we observed that after 48 hours of treatment with Nifuroxazide, the expression of p-STAT3 in irradiated cells was significantly inhibited. Furthermore, compared to radiation alone, combined Nifuroxazide and radiotherapy resulted in a more pronounced decrease in PCNA expression. Simultaneously, we performed additional detection of migration-related protein MMP2 expression (revised Fig 2B), confirming that combined Nifuroxazide and radiotherapy led to a more significant inhibition of MMP2 expression. These findings suggest that the combined treatment may be responsible for the synergistic suppression of HCC cell proliferation and migration. We have included relevant discussions in our manuscript.

      Our initial results indicate that Nifuroxazide inhibits the expression of PD-L1 at the protein level, but does not affect its mRNA level. Interestingly, upon treatment with a proteasome inhibitor MG132, the inhibitory effect of Nifuroxazide on PD-L1 was eliminated, suggesting that Nifuroxazide may enhance the degradation of PD-L1 protein. Our experiments have demonstrated the inhibitory effect of Nifuroxazide on PD-L1 in both human and mouse cell lines. However, to translate these findings into clinical application for the treatment of hepatocellular carcinoma, additional research is necessary, including validation in genetically engineered mouse models of HCC. We have addressed these points in the discussion section of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) Please improve the quality of Figure 3E. It is hard to figure out the bar and details.

      Response: Thank you for your valuable feedback. We have meticulously revised the figures to enhance their clarity and presentation (revised Fig 3E).

      2) In Figure 7E, please elucidate the methods used for calculating the amount of PD-L1 mRNA level. Please adjust the picture angle and label the marker size on the left as well

      Response: Thank you for your feedback. We have incorporated a method for calculating PD-L1 mRNA levels and revised the corresponding figures accordingly (revised Fig 7E).

      Reviewer #2 (Recommendations For The Authors):

      Questions:

      1) What is the advantage of using a combination of nifuroxazide and radiotherapy in comparison to using a combination of anti-PD1/PDL1 and radiotherapy?

      Response: Thank you very much for your insightful comments. We believe that the advantage of nifuroxazide over PD-1 or PD-L1 antibodies lies in its ability not only to effectively inhibit PD-L1 expression but also to suppress tumor cell proliferation, migration, and promote cell apoptosis (Supplementary Fig 1). We have also expanded on these aspects in the discussion section of the manuscript.

      2) For the characterization of tumor microenvironment and immune cells in the spleen, were the same cell populations being investigated? What about NK and Treg cells in tumors? What about M1 macrophages in spleen?

      Response: Thank you very much for your insightful suggestion. We have measured the infiltration of NK and Treg cells in tumor tissues (Supplementary Fig 2), as well as the abundance of M1 macrophages (revised Fig 6) in the spleen, and provided additional relevant data to strengthen our study.

      Other comments:

      1) The data in Fig 1 is solid. However, it is hard to distinguish the effect of increased radiosensitivity by nifuroxazide from combined anti-tumor effects by two different treatments. The anti-tumor role of Nifuroxazide has been reported in melanoma, colorectal carcinoma, and hepatocellular carcinoma previously (PMID: 26830149; 28055016, 26154152). Therefore, the increased apoptosis and decreased proliferation and migration could be caused by nifuroxazide and not related to the sensitivity of cells to radiation therapy.

      Response: Thank you very much for your constructive feedback. As you suggested, the anti-tumor role of nifuroxazide has been reported. However, the innovation of our study does not lie in confirming its antitumor effects but rather in demonstrating how nifuroxazide can enhance radiotherapy's efficacy in treating hepatocellular carcinoma by inhibiting PD-L1 levels.

      We compared the efficacy of combined therapy versus radiotherapy and found that compared to radiation alone, combined therapy more significantly inhibited hepatocellular carcinoma cell proliferation and migration. In our animal model, we compared the therapeutic effects of combined therapy, nifuroxazide, and radiotherapy on hepatocellular carcinoma-bearing mice. We observed that compared to individual treatment groups, combined therapy more profoundly suppressed tumor growth and enhanced the antitumor effects in the mice.

      In response to your feedback, we have expanded the discussion on the impact of combined therapy versus nifuroxazide or radiotherapy on hepatocellular carcinoma cell proliferation, migration, and apoptosis (Supplementary Fig 1). The data show that compared to either individual therapy, combined therapy further inhibited cell proliferation and migration while promoting apoptosis.

      2) There is no direct evidence to show the improved efficacy of radiation therapy by nifuroxazide through the degradation of PD-L1.

      Response: Thank you very much for your valuable suggestions. In our cell experiments, we found that nifuroxazide inhibits the increased expression of PD-L1 in cells induced by radiation therapy, and this inhibitory effect is counteracted when using the proteasome inhibitor MG132. Therefore, we speculate that nifuroxazide may inhibit PD-L1 expression through a proteasome-dependent mechanism. To better reflect this, we have revised the title of our manuscript to "Nifuroxazide Suppresses PD-L1 Expression and Enhances the Efficacy of Radiotherapy in Hepatocellular Carcinoma."

      3) "The oncogene Stat3.....was effectively inhibited by radiotherapy in cells" - this sentence may be rephrased to make the point clear. The authors might mean to say "activation of the oncogene stat3...."

      "The results demonstrated that the combination therapy increased the expression of PARP," the authors might mean to say "expression of c-PARP"

      Response: Thank you very much for your feedback. We have revised the relevant sentence descriptions to improve clarity and accuracy.

      4) "histomorphology significantly improved after the treatment with nifuroxazide and radiation therapy (Fig 3E)." How to define "improved histomorphology"? The authors may want to provide more details to clarify "improved".

      Response: Thank you very much for your feedback. We have revised the relevant sentence descriptions to improve clarity and accuracy.

      5) In addition to normalizing protein expression by tubulin, the authors may consider normalizing p-stat3 expression level by stat3.

      Response: Thank you very much for your feedback. We have conducted a quantitative analysis of the expression levels of p-STAT3 and STAT3 (revised Fig 2A).

      6) Figure 3C and D, using a different color to represent each group might help the readers to better differentiate each group.

      Response: Thank you very much for your feedback. Following your suggestion, we have revised the figures accordingly (revised Fig 3C and 3D).

      Reviewer #3 (Recommendations For The Authors):

      In this study, the authors revealed the pivotal role of nifuroxazide in augmenting the efficacy of radiotherapy. This was evidenced by its synergistic effect in suppressing the proliferation and migratory capabilities of HCC cells, alongside its capacity to induce apoptosis in these cells. Furthermore, their findings underscored the substantial synergy between nifuroxazide and radiotherapy in retarding tumor growth, thereby extending survival rates in a tumor-bearing murine model. Moreover, the authors observed that nifuroxazide combined with radiotherapy significantly increases the tumor-infiltrating CD4+ T cells, CD8+ T cells, and M1 macrophages. Finally, the authors found that nifuroxazide countered the radiotherapy-induced upregulation of PD-L1 through the ubiquitin-proteasome pathway. However, the evidence for supporting the main claims is only partially supported. The following are my concerns and suggestions.

      1) In Figures 1 and 2, the authors convincingly demonstrate the synergistic impact of nifuroxazide and radiotherapy on curtailing the proliferation, colony formation, and migratory capabilities of HCC cells, while also instigating apoptosis in these cells. However, the underlying molecular mechanism remains elusive. A recent study highlighted nifuroxazide's potential to impede the proliferation of glioblastoma cells and induce apoptosis via the MAP3K1/JAK2/STAT3 pathway (Wang X., et al., Int Immunopharmacol. 2023 May;118:109987. doi: 10.1016/j.intimp.2023.109987). It would be valuable for the authors to investigate whether nifuroxazide employs a similar molecular mechanism to regulate proliferation and apoptosis in the context of HCC. This could offer deeper insights into the mechanisms at play in their observed effects.

      Response: Thank you very much for your insightful comments. As you pointed out, previous studies have reported that nifuroxazide exerts antitumor effects by inhibiting the STAT3 pathway. However, in our experiments, we observed that radiation therapy significantly increased the expression of PD-L1, but showed a trend of decreased p-STAT3 expression. Therefore, we believe that nifuroxazide does not inhibit PD-L1 expression through the STAT3 pathway. Subsequently, our further research revealed that the inhibitory effect of nifuroxazide on PD-L1 can be counteracted by a proteasome inhibitor. Thus, we propose that nifuroxazide inhibits PD-L1 expression through a proteasome-dependent mechanism, thereby enhancing the efficacy of radiation therapy in hepatocellular carcinoma.

      2) Figures 1 and 2 solely rely on the HepG2 cell line to establish their conclusions. To validate these findings robustly, it is recommended that another HCC cell line be included in the study. This additional cell line will contribute to the generalizability and reliability of the results, enhancing the overall credibility of the study's conclusions.

      Response: Thank you very much for your suggestion. We have included additional experimental results with the relevant cell line Huh7 (supplementary Fig 3).

      3) Figure 3 demonstrates the use of only one syngeneic mouse H22 tumor model. To ensure the robustness and validity of this finding, it would be advisable to incorporate at least one more syngeneic mouse HCC tumor model or even an orthotopic mouse tumor model. The inclusion of additional models would bolster the significance and reliability of the observed results, contributing to a more comprehensive understanding of the phenomenon under investigation.

      Response: Thank you for your valuable suggestion. In the H22 mouse tumor model, we conducted relevant assessments of survival rate and tumor growth. The results confirm that the combination of nifuroxazide and radiation therapy exhibits a promising synergistic antitumor effect. However, to achieve the goal of applying nifuroxazide combined with radiation therapy for the treatment of clinical hepatocellular carcinoma, we still need to undertake extensive research, including validation on genetically identical mouse HCC tumor models. We have also included relevant discussions in our ongoing discussions.

      4) In Figure 5, employing an alternative method, such as the flow cytometry assay, to analyze and corroborate the tumor-infiltrating immune cell profiling following various treatments would enhance the rigor of the study. This additional approach would provide a complementary perspective and validate the findings, strengthening the overall reliability and impact of the results presented.

      Response: Thank you for your insightful suggestion. We have included additional experimental data to strengthen our study (supplementary Fig 2).

      5) In Figure 7, the conclusion drawn regarding nifuroxazide's impact on PD-L1 expression through ubiquitination-proteasome mechanisms seems to lack the robust evidence needed to firmly establish nifuroxazide's role in regulating PD-L1 ubiquitination. To reinforce this aspect of the study, the authors may conduct comprehensive in vitro and in vivo ubiquitination assays. Performing these assays would offer direct insights into whether nifuroxazide genuinely influences PD-L1 ubiquitination, thus fortifying the credibility and importance of the reported findings.

      Response: Thank you for your valuable feedback. Our initial findings suggest that nifuroxazide inhibits the expression of PD-L1 protein levels, but does not affect the mRNA levels. Moreover, upon treatment with the proteasome inhibitor MG132, the inhibitory effect of nifuroxazide on PD-L1 was found to be abolished. Concurrently, we observed that nifuroxazide significantly enhances GSK-3β expression in both cell and animal experiments. Consequently, we propose that nifuroxazide augments the degradation of PD-L1 protein.

      6) Statistical methods should be included in the captions of all the figures with statistical graphs. The size of the scale should be supplemented with a description in the captions.

      Response: Thank you for your valuable suggestion. We have made the appropriate modifications to our study based on your recommendations.

      7) Considering the outcomes presented in the study, it appears that the title "Nifuroxazide enhances radiotherapy efficacy against hepatocellular carcinoma by upregulating PD-L1 degradation via the ubiquitin-proteasome pathway" may not accurately reflect the findings.

      Response: Thank you for your insightful feedback. We have revised the title to read, "Inhibitory Effects of Nifuroxazide on PD-L1 Expression and Enhanced Radiotherapy Efficacy in Hepatocellular Carcinoma".

      References

      1) Xie C, Zhou X, Liang C, Li X, Ge M, Chen Y, et al. Apatinib triggers autophagic and apoptotic cell death via VEGFR2/STAT3/PD-L1 and ROS/Nrf2/p62 signaling in lung cancer. Journal of experimental & clinical cancer research : CR. 2021;40(1):266. doi: 10.1186/s13046-021-02069-4.

      2) de la Torre-Alaez M, Matilla A, Varela M, Inarrairaegui M, Reig M, Lledo JL, et al. Nivolumab after selective internal radiation therapy for the treatment of hepatocellular carcinoma: a phase 2, single-arm study. Journal for immunotherapy of cancer. 2022;10(11). doi: 10.1136/jitc-2022-005457.

      3) Yang F, Hu M, Lei Q, Xia Y, Zhu Y, Song X, et al. Nifuroxazide induces apoptosis and impairs pulmonary metastasis in breast cancer model. Cell Death Dis. 2015;6(3):e1701. doi: 10.1038/cddis.2015.63.

      4) Nelson EA, Walker SR, Kepich A, Gashin LB, Hideshima T, Ikeda H, et al. Nifuroxazide inhibits survival of multiple myeloma cells by directly inhibiting STAT3. Blood. 2008;112(13):5095-102. doi: 10.1182/blood-2007-12-129718.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents H3-OPT, a deep learning method that effectively combines existing techniques for the prediction of antibody structure. This work is important because the method can aid the design of antibodies, which are key tools in many research and industrial applications. The experiments for validation are solid.

      Comments to Author:

      Several points remain partially unclear, such as:

      1). Which examples constitute proper validation;

      Thank you for your kind reminder. We have modified the text of the experiments for validation to identify which examples constitute proper validation. We have corrected the “Finally, H3-OPT also shows lower Cα-RMSDs compared to AF2 or tFold-Ab for the majority of targets in an expanded benchmark dataset, including all antibody structures from CAMEO 2022” into “Finally, H3-OPT also shows lower Cα-RMSDs compared to AF2 or tFold-Ab for the majority (six of seven) of targets in an expanded benchmark dataset, including all antibody structures from CAMEO 2022” and added the following sentence in the experimental validation section of our revised manuscript to clarify which examples constitute proper validation: “AlphaFold2 outperformed IgFold on these targets”.

      2) What the relevance of the molecular dynamics calculations as performed is;

      Thank you for your comment, and I apologize for any confusion. The goal of our molecular dynamics calculations is to compare the differences in binding affinities, an important issue of antibody engineering, between AlphaFold2-predicted complexes and H3-OPT-predicted complexes. Molecular dynamics simulations enable the investigation of the dynamic behaviors and interactions of these complexes over time. Unlike other tools for predicting binding free energy, MM/PBSA or MM/GBSA calculations provide dynamic properties of complexes by sampling conformational space, which helps in obtaining more accurate estimates of binding free energy. In summary, our molecular dynamics calculations demonstrated that the binding free energies of H3-OPT-predicted complexes are closer to those of native complexes. We have included the following sentence in our manuscript to provide an explanation of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”.

      3) The statistics for some of the comparisons;

      Thank you for the comment. We have incorporated statistics for some of the comparisons in the revised version of our manuscript and added the following sentence in the Methods section: “We conducted two-sided t-test analyses to assess the statistical significance of differences between the various groups. Statistical significance was considered when the p-values were less than 0.05. These statistical analyses were carried out using Python 3.10 with the Scipy library (version 1.10.1).”.

      4) The lack of comparison with other existing methods.

      We appreciate your valuable comments and suggestions. Conducting comparisons with a broader set of existing methods can further facilitate discussions on the strengths and weaknesses of each method, as well as the accuracy of our method. In our study, we conducted a comparison of H3-OPT with many existing methods, including AlphaFold2, HelixFold-Single, ESMFold, and IgFold. We demonstrated that several protein structure prediction methods, such as ESMFold and HelixFold-Single, do not match the accuracy of AlphaFold2 in CDR-H3 prediction. Additionally, we performed a detailed comparison between H3-OPT, AlphaFold2, and IgFold (the latest antibody structure prediction method) for each target.

      We sincerely thank the comment and have introduced a comparison with OmegaFold. The results have been incorporated into the relevant sections (Fig 4a-b) of the revised manuscript.

      Author response image 1.

      Public Reviews

      Comments to Author:

      Reviewer #1 (Public Review):

      Summary:

      The authors developed a deep learning method called H3-OPT, which combines the strength of AF2 and PLM to reach better prediction accuracy of antibody CDR-H3 loops than AF2 and IgFold. These improvements will have an impact on antibody structure prediction and design.

      Strengths:

      The training data are carefully selected and clustered, the network design is simple and effective.

      The improvements include smaller average Ca RMSD, backbone RMSD, side chain RMSD, more accurate surface residues and/or SASA, and more accurate H3 loop-antigen contacts.

      The performance is validated from multiple angles.

      Weaknesses:

      1) There are very limited prediction-then-validation cases, basically just one case.

      Thanks for pointing out this issue. The number of prediction-then-validation cases is helpful to show the generalization ability of our model. However, obtaining experimental structures is both costly and labor-intensive. Furthermore, experimental validation cases only capture a limited portion of the sequence space in comparison to the broader diversity of antibody sequences.

      To address this challenge, we have collected different datasets to serve as benchmarks for evaluating the performance of H3-OPT, including our non-redundant test set and the CAMEO dataset. The introduction of these datasets allows for effective assessments of H3-OPT’s performance without biases and tackles the obstacle of limited prediction-then-validation cases.

      Reviewer #2 (Public Review):

      This work provides a new tool (H3-Opt) for the prediction of antibody and nanobody structures, based on the combination of AlphaFold2 and a pre-trained protein language model, with a focus on predicting the challenging CDR-H3 loops with enhanced accuracy than previously developed approaches. This task is of high value for the development of new therapeutic antibodies. The paper provides an external validation consisting of 131 sequences, with further analysis of the results by segregating the test sets into three subsets of varying difficulty and comparison with other available methods. Furthermore, the approach was validated by comparing three experimentally solved 3D structures of anti-VEGF nanobodies with the H3-Opt predictions

      Strengths:

      The experimental design to train and validate the new approach has been clearly described, including the dataset compilation and its representative sampling into training, validation and test sets, and structure preparation. The results of the in-silico validation are quite convincing and support the authors' conclusions.

      The datasets used to train and validate the tool and the code are made available by the authors, which ensures transparency and reproducibility, and allows future benchmarking exercises with incoming new tools.

      Compared to AlphaFold2, the authors' optimization seems to produce better results for the most challenging subsets of the test set.

      Weaknesses:

      1) The scope of the binding affinity prediction using molecular dynamics is not that clearly justified in the paper.

      We sincerely appreciate your valuable comment. We have added the following sentence in our manuscript to justify the scope of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”.

      2) Some parts of the manuscript should be clarified, particularly the ones that relate to the experimental validation of the predictions made by the reported method. It is not absolutely clear whether the experimental validation is truly a prospective validation. Since the methodological aspects of the experimental determination are not provided here, it seems that this may not be the case. This is a key aspect of the manuscript that should be described more clearly.

      Thank you for the reminder about experimental validation of our predictions. The sequence identities of the wild-type nanobody VH domain and H3 loop, when compared with the best template, are 0.816 and 0.647, respectively. As a result, these mutants exhibited low sequence similarity to our dataset, indicating the absence of prediction bias for these targets. Thus, H3-OPT outperformed IgFold on these mutants, demonstrating our model's strong generalization ability. In summary, the experimental validation actually serves as a prospective validation.

      Thanks for your comments, we have added the following sentence to provide the methodological aspects of the experimental determination: “The protein expression, purification and crystallization experiments were described previously. The proteins used in the crystallization experiments were unlabeled. Upon thawing the frozen protein on ice, we performed a centrifugation step to eliminate any potential crystal nucleus and precipitants. Subsequently, we mixed the protein at a 1:1 ratio with commercial crystal condition kits using the sitting-drop vapor diffusion method facilitated by the Protein Crystallization Screening System (TTP LabTech, mosquito). After several days of optimization, single crystals were successfully cultivated at 21°C and promptly flash-frozen in liquid nitrogen. The diffraction data from various crystals were collected at the Shanghai Synchrotron Research Facility and subsequently processed using the aquarium pipeline.”

      3) Some Figures would benefit from a clearer presentation.

      We sincerely thanks for your careful reading. According to your comments, we have made extensive modifications to make our presentation more convincing and clearer (Fig 2c-f).

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript introduces a new computational framework for choosing 'the best method' according to the case for getting the best possible structural prediction for the CDR-H3 loop. The authors show their strategy improves on average the accuracy of the predictions on datasets of increasing difficulty in comparison to several state-of-the-art methods. They also show the benefits of improving the structural predictions of the CDR-H3 in the evaluation of different properties that may be relevant for drug discovery and therapeutic design.

      Strengths:

      The authors introduce a novel framework, which can be easily adapted and improved. The authors use a well-defined dataset to test their new method. A modest average accuracy gain is obtained in comparison to other state-of-the art methods for the same task while avoiding testing different prediction approaches.

      Weaknesses:

      1) The accuracy gain is mainly ascribed to easy cases, while the accuracy and precision for moderate to challenging cases are comparable to other PLM methods (see Fig. 4b and Extended Data Fig. 2). That raises the question: how likely is it to be in a moderate or challenging scenario? For example, it is not clear whether the comparison to the solved X-ray structures of anti-VEGF nanobodies represents an easy or challenging case for H3-OPT. The mutant nanobodies seem not to provide any further validation as the single mutations are very far away from the CDR-H3 loop and they do not disrupt the structure in any way. Indeed, RMSD values follow the same trend in H3-OPT and IgFold predictions (Fig. 4c). A more challenging test and interesting application could be solving the structure of a designed or mutated CDR-H3 loop.

      Thank you for your rigorous consideration. When the experimental structure is unavailable, it is difficult to directly determinate whether the target is easy-to-predict or challenging. We have conducted our non-redundant test set in which the number of easy-to-predict targets is comparable to the other two groups. Due to the limited availability of experimental antibody structures, especially nanobody structures, accurately predicting CDR-H3 remains a challenge. In our manuscript, we discuss the strengths and weakness of AlphaFold2 and other PLM-based methods, and we introduce H3-OPT as a comprehensive solution for antibody CDR3 modeling.

      We also appreciate your comment on experimental structures. We fully agree with your opinion and made attempts to solve the experimental structures of seven mutants, including two mutants (Y95F and Q118N) which are close to CDR-H3 loop. Unfortunately, we tried seven different reagent kits with a total of 672 crystallization conditions, but were unable to obtain crystals for these mutants. Despite the mutants we successfully solved may not have significantly disrupted the structures of CDR-H3 loops, they have still provided valuable insights into the differences between MSA-based methods and MSA-free methods (such as IgFold) for antibody structure modeling.

      We have further conducted a benchmarking study using two examples, PDBID 5U15 and 5U0R, both consisting of 18 residues in CDR-H3, to evaluate H3-OPT's performance in predicting mutated H3 loops. In the first case (target 5U15), AlphaFold2 failed to provide an accurate prediction of the extended orientation of the H3 loop, resulting in a less accurate prediction (Cα-RMSD = 10.25 Å) compared to H3-OPT (Cα-RMSD = 5.56 Å). In the second case (target 5U0R, a mutant of 5U15 in CDR3 loop), AlphaFold2 and H3-OPT achieved Cα-RMSDs of 6.10 Å and 4.25 Å, respectively. Additionally, the Cα-RMSDs of OmegaFold predictions were 8.05 Å and 9.84 Å, respectively. These findings suggest that both AlphaFold2 and OmegaFold effectively captured the mutation effects on conformations but achieved lower accuracy in predicting long CDR3 loops when compared to H3-OPT.

      2) The proposed method lacks a confidence score or a warning to help guide the users in moderate to challenging cases.

      We appreciate your suggestions and we have trained a separate module to predict confidence scores. We used the MSE loss for confidence prediction, where the label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100.

      3) The fact that AF2 outperforms H3-OPT in some particular cases (e.g. Fig. 2c and Extended Data Fig. 3) raises the question: is there still room for improvements? It is not clear how sensible is H3-OPT to the defined parameters. In the same line, bench-marking against other available prediction algorithms, such as OmegaFold, could shed light on the actual accuracy limit. We totally understand your concern. Many papers have suggested that PLM-based models are computationally efficient but may have unsatisfactory accuracy when high-resolution templates and MSA are available (Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies, Ruffolo, J. A. et al, 2023). However, the accuracy of AF2 decreased substantially when the MSA information is limited. Therefore, we directly retained high-confidence structures of AF2 and introduced a PSPM to improve the accuracy of the targets with long CDR-H3 loops and few sequence homologs. The improvement in mean Cα-RMSD demonstrated the room for accurately predicting CDR-H3 loops.

      We also appreciate your kind comment on defined parameters. In fact, once a benchmark dataset is established, determining an optimal cutoff value through parameter searching can indeed further improve the performance of H3-OPT in CDR3 structure prediction. However, it is important to note that this optimal cutoff value heavily depends on the testing dataset being used. Therefore, we provide a recommended cutoff value and offer a program interface for users who wish to manually define the cutoff value based on their specific requirements. Here, we showed the average Cα-RMSDs of our test set under different confidence cutoffs and the results have been added in the text accordingly.

      Author response table 1.

      We also appreciate your reminder, and we have conducted a benchmark against OmegaFold. The results have been included in the manuscript (Fig 4a-b).

      Author response image 3.

      Reviewer #1 (Recommendations For The Authors):

      1) In Fig 3a, please also compare IgFold and H3-OPT (merge Fig. S2 into Fig 3a)

      In Fig 3b, please separate Sub2 and Sub3, and add IgFold's performance.

      Thank you very much for your professional advice. We have made revisions to the figures based on your suggestions.

      Author response image 4.

      2) For the three experimentally solved structures of anti-VEGF nanobodies, what are the sequence identities of the VH domain and H3 loop, compared to the best available template? What is the length of the H3 loop? Which category (Sub1/2/3) do the targets belong to? What is the performance of AF2 or AF2-Multimer on the three targets?

      We feel sorry for these confusions. The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template. The CDR-H3 lengths of these nanobodies are both 17. According to our classification strategy, these nanobodies belong to Sub1. The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM.

      3) Is AF2-Multimer better than AF2, when using the sequences of antibody VH and antigen as input?

      Thanks for your suggestions. Many papers have benchmarked AlphaFold2-Multimer for protein complex modeling and demonstrated the accuracy of AlphaFold2-Multimer on predicting the protein complex is far from satisfactory (Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants, Rui Yin, et al., 2022). Additionally, there is no significantly difference between AlphaFold2 and AlphaFold2-Multimer on antibody modeling (Structural Modeling of Nanobodies: A Benchmark of State-of-the-Art Artificial Intelligence Programs, Mario S. Valdés-Tresanco, et al., 2023)

      From the data perspective, we employed a non-redundant dataset for training and validation. Since these structures are valuable, considering the antigen sequence would reduce the size of our dataset, potentially leading to underfitting.

      4) For H3 loop grafting, I noticed that only identical target and template H3 sequences can trigger grafting (lines 348-349). How many such cases are in the test set?

      We appreciate your comment from this perspective. There are thirty targets in our database with identical CDR-H3 templates.

      Reviewer #2 (Recommendations For The Authors):

      • It is not clear to me whether the three structures apparently used as experimental confirmation of the predictions have been determined previously in this study or not. This is a key aspect, as a retrospective validation does not have the same conceptual value as a prospective, a posteriori validation. Please note that different parts of the text suggest different things in this regard "The model was validated by experimentally solving three structures of anti-VEGF nanobodies predicted by H3-OPT" is not exactly the same as "we then sought to validate H3-OPT using three experimentally determined structures of anti-VEGF nanobodies, including a wild-type (WT) and two mutant (Mut1 and Mut2) structures, that were recently deposited in protein data bank". The authors are kindly advised to make this point clear. By the way, "protein data bank" should be in upper case letters.

      We gratefully thank you for your feedback and fully understand your concerns. To validate the performance of H3-OPT, we initially solved the structures of both the wild-type and mutants of anti-VEGF nanobodies and submitted these structures to Protein Data Bank. We have corrected “that were recently deposited in protein data bank” into “that were recently deposited in Protein Data Bank” in our revised manuscript.

      • It would be good to clarify the goal and importance of the binding affinity prediction, as it seems a bit disconnected from the rest of the paper. Also, it would be good to include the production MD runs as Sup, Mat.

      Thanks for your valuable comment. We have added the following sentence in our manuscript to clarify the goal and importance of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”. The details of production runs have been described in Method section.

      • Has any statistical test been performed to compare the mean Cα-RMSD values across the modeling approaches included in the benchmark exercise?

      Thanks for this kind recommendation. We conducted a statistical test to assess the performance of different modeling approaches and demonstrated significant improvements with H3-OPT compared to other methods (p<0.001). Additionally, we have trained H3-OPT with five random seeds and compared mean Cα-RMSD values with all five models of AF2. Here, we showed the average Cα-RMSDs of H3-OPT and AlphaFold2.

      Author response table 1.

      • In Fig. 2c-f, I think it would be adequate to make the ordering criterion of the data points explicit in the caption or the graph itself.

      We appreciate your comment and suggestion. We have revised the graph in the manuscript accordingly.

      Author response image 5.

      • Please revise Figure S2 caption and/or its content. It is not clear, in parts b and c, which is the performance of H3-OPT. Why weren´t some other antibody-specific tools such as IgFold included in this comparison?

      Thanks for your comments. The performance of H3-OPT is not included in Figure S2. Prior to training H3-OPT, we conducted several preliminary studies, and the detailed results are available in the supplementary sections. We showed that AlphaFold2 outperformed other methods (including AI-based methods and TBM methods) and produced sub-angstrom predictions in framework regions. The comparison of IgFold with other methods was discussed in a previous work (Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies, Ruffolo, J. A. et al, 2023). In that study, we found that IgFold largely yielded results comparable to AlphaFold2 but with lower prediction cost. Additionally, we have also conducted a detailed comparison of CDR-H3 loops with IgFold in our main text.

      • It is stated that "The relative binding affinities of the antigen-antibody complexes were evaluated using the Python script...". Which Python script?

      Thank you for your comments, and I apologize for the confusion. This python script is a module of AMBER software, we have corrected “The relative binding affinities of the antigen-antibody complexes were evaluated using the python script” into “The relative binding affinities of the antigen-antibody complexes were evaluated using the MMPBSA module of AMBER software”.

      Reviewer #3 (Recommendations For The Authors):

      Does H3-OPT improve the AF2 score on the CDR-H3? It would be interesting to see whether grafted and PSPM loops improve the pLDDT score by using for example AF2Rank [https://doi.org/10.1103/PhysRevLett.129.238101]. That could also be a way to include a confidence score into H3-OPT.

      We are so grateful for your kind question. H3-OPT could not provide a confidence score for output in current version, so we did not know whether H3-OPT improve the AF2 score or not.

      We appreciate your kind recommendations and have calculated the pLDDT scores of all models predicted by H3-OPT and AF2 using AF2Rank. We showed that the average of pLDDT scores of different predicted models did not match the results of Cα-RMSD values.

      Author response table 3.

      Therefore, we have trained a separate module to predict the confidence score of the optimized CDR-H3 loops. We hope that this module can provide users with reliable guidance on whether to use predicted CDR-H3 loops.

      The test case of Nb PDB id. 8CWU is an interesting example where AF2 outperforms H3-OPT and PLMs. The top AF2 model according to ColabFold (using default options and no template [https://doi.org/10.1038/s41592-022-01488-1]) shows a remarkably good model of the CDR-H3, explaining the low Ca-RMSD in the Extended Data Fig. 3. However, the pLDDT score of the 4 tip residues (out of 12), forming the hairpin of the CDR-H3 loop, pushes down the average value bellow the CBM cut-off of 80. I wonder if there is a lesson to learn from that test case. How sensible is H3-OPT to the CBM cut-off definition? Have the authors tried weighting the residue pLDDT score by some structural criteria before averaging? I guess AF2 may have less confidence in hydrophobic tip residues in exposed loops as the solvent context may not provide enough support for the pLDDT score.

      Thanks for your valuable feedback. We showed the average Cα-RMSDs of our test set under different confidence cutoffs and the results have been added in the text accordingly.

      Author response table 4.

      We greatly appreciate your comment on this perspective. Inspired on your kind suggestions, we will explore the relationship between cutoff values and structural information in related work. Your feedback is highly valuable as it will contribute to the development of our approach.

      A comparison against the new folding prediction method OmegaFold [https://doi.org/10.1101/2022.07.21.500999] is missed. OmegaFold seems to outperform AF2, ESM, and IgFold among others in predicting the CDR-H3 loop conformation (See [https://doi.org/10.3390/molecules28103991] and [https://doi.org/10.1101/2022.07.21.500999]). Indeed, prediction of anti-VEGF Nb structure (PDB WT_QF_0329, chain B in supplementary data) by OmegaFold as implemented in ColabFold [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb] and setting 10 cycles, renders Ca-RMSD 1.472 Å for CDR-H3 (residues 98-115).

      We appreciate your valuable suggestion. We have added the comparison against OmegaFold in our manuscript. The results have been included in the manuscript (Fig 4a-b).

      Author response image 6.

      In our test set, OmegaFold outperformed ESMFold in predicting the CDR-H3 loop conformation. However, it failed to match the accuracy of AF2, IgFold, and H3-OPT. We discussed the difference between MSA-based methods (such as AlphaFold2) and MSA-free methods (such as IgFold) in predicting CDR-H3 loops. Similarly, OmegaFold provided comparative results with HelixFold-Single and other MSA-free methods but still failed to match the accuracy of AlphaFold2 and H3-OPT on Sub1.

      The time-consuming step in H3-OPT is the AF2 prediction. However, most of the time is spent in modeling the mAb and Nb scaffolds, which are already very well predicted by PLMs (See Fig. 4 in [https://doi.org/10.3390/molecules28103991]). Hence, why not use e.g. OmegaFold as the first step, whose score also correlates to the RMSD values [https://doi.org/10.3390/molecules28103991]? If that fails, then use AF2 or grafting. Alternatively, use a PLM model to generate a template, remove/mask the CDR loops (at least CDR-H3), and pass it as a template to AF2 to optimize the structure with or without MSA (e.g. using AF2Rank).

      Thanks for your professional feedbacks. It is really true that the speed of MSA searching limited the application of high-throughput structure prediction. Previous studies have demonstrated that the deep learning methods performed well on framework residues. We once tried to directly predict the conformations of CDR-H3 loops using PLM-based methods, but this initial version of H3-OPT lacking the CBM could not replicate the accuracy of AF2 in Sub1. Similarly, we showed that IgFold and OmegaFold also provide lower accuracy in Sub1 (average Cα-RMSD is 1.71 Å and 1.83 Å, respectively, whereas AF2 predicted an average of 1.07 Å). Therefore, The predictions of AlphaFold2 not only produce scaffolds but also provide the highest quality of CDR-H3 loops when high-resolution templates and MSA are available.

      Thank you once again for your kind recommendation. In the current version of H3-OPT, we have highlighted the strengths of H3-OPT in combining the AF2 and PLM models in various scenarios. AF2 can provide accurate predictions for short loops with fewer than 10 amino acids, and PLM-based models show little or no improvement in such cases. In the next version of H3-OPT, as the first step, we plan to replace the AF2 models with other methods if any accurate MSA-free method becomes available in the future.

      Line 115: The statement "IgFold provided higher accuracy in Sub3" is not supported by Fig. 2a.

      We are sorry for our carelessness. We have corrected “IgFold provided higher accuracy in Sub3” into “IgFold provided higher accuracy in Sub3 (Fig. 3a)”.

      Lines 195-203: What is the statistical significance of results in Fig 5a and 5b?

      Thank you for your kind comments. The surface residues of AF2 models are significantly higher than those of H3-OPT models (p < 0.005). In Fig. 5b, H3-OPT models predicted lower values than AF2 models in terms of various surface properties, including polarity (p <0.05) and hydrophilicity (p < 0.001).

      Lines 212-213: It is not easy to compare and quantify the differences between electrostatic maps in Fig. 5d. Showing a Dmap (e.g. mapmodel - mapexperiment) would be a better option. Additionally, there is no methodological description of how the maps were generated nor the scale of the represented potential.

      Thank you for pointing this out. We have modified the figure (Fig. 5d) according to your kind recommendation and added following sentences to clarify the methodological description on the surface electrostatic potential:

      “Analysis of surface electrostatic potential

      We generated two-dimensional projections of CDR-H3 loop’s surface electrostatic potential using SURFMAP v2.0.0 (based on GitHub from February 2023: commit: e0d51a10debc96775468912ccd8de01e239d1900) with default parameters. The 2D surface maps were calculated by subtracting the surface projection of H3-OPT or AF2 predicted H3 loops to their native structures.”

      Author response image 7.

      Lines 237-240 and Table 2: What is the meaning of comparing the average free energy of the whole set? Why free energies should be comparable among test cases? I think the correct way is to compare the mean pair-to-pair difference to the experimental structure. Similarly, reporting a precision in the order of 0.01 kcal/mol seems too precise for the used methodology, what is the statistical significance of the results? Were sampling issues accounted for by performing replicates or longer MDs?

      Thanks for your rigorous advice and pointing out these issues. We have modified the comparisons of free energies of different predicted methods and corrected the precision of these results. The average binding free energies of H3-OPT complexes is lower than AF2 predicted complexes, but there is no significant difference between these energies (p >0.05).

      Author response table 4.

      Comparison of binding affinities obtained from MD simulations using AF2 and H3-OPT.

      Thanks for your comments on this perspective. Longer MD simulations often achieve better convergence for the average behavior of the system, while replicates provide insights into the variability and robustness of the results. In our manuscript, each MD simulation had a length of 100 nanoseconds, with the initial 90 nanoseconds dedicated to achieving system equilibrium, which was verified by monitoring RMSD (Root Mean Square Deviation). The remaining 10 nanoseconds of each simulation were used for the calculation of free energy. This approach allowed us to balance the need for extensive sampling with the verification of system stability.

      Regarding MD simulations for CDR-H3 refinement, its successful application highly depends on the starting conformation, the force field, and the sampling strategy [https://doi.org/10.1021/acs.jctc.1c00341]. In particular, the applied plan MD seems a very limited strategy (there is not much information about the simulated times in the supplementary material). Similarly, local structure optimizations with QM methods are not expected to improve a starting conformation that is far from the experimental conformation.

      Thank you very much for your valuable feedback. We fully agree with your insights regarding the limitations of MD simulations. Before training H3-OPT, we showed the challenge of accurately predicting CDR-H3 structures. We then tried to optimize the CDR-H3 loops by computational tools, such as MD simulations and QM methods (detailed information of MD simulations is provided in the main text). Unfortunately, these methods were not expected to improve the accuracy of AF2 predicted CDR-H3 loops. These results showed that MD simulations and QM methods not only are time-consuming, but also failed to optimize the CDR-H3 loops. Therefore, we developed H3-OPT to tackle these issues and improve the accuracy of CDR3-H3 for the development of antibody therapeutics.

      Text improvements

      Relevant statistical and methodological parameters are presented in a dispersed manner throughout the text. For example, the number of structures in test, training, and validation datasets is first presented in the caption of Fig. 4. Similarly, the sequence identity % to define redundancy is defined in the caption of Fig. 1a instead of lines 87-88, where authors define "we constructed a non-redundant dataset with 1286 high-resolution (<2.5 Å)". Is the sequence redundancy for the CDR-H3 or the whole mAb/Nb?

      Thank you for pointing out these issues. We have added the number of structures in each subgroup in the caption of Fig. 1a: “Clustering of the filtered, high-resolution structures yielded three datasets for training (n = 1021), validation (n = 134), and testing (n = 131).” and corrected “As data quality has large effects on prediction accuracy, we constructed a non-redundant dataset with 1286 high-resolution (<2.5 Å) antibody structures from SAbDab” into “As data quality has large effects on prediction accuracy, we constructed a non-redundant dataset (sequence identity < 0.8) with 1286 high-resolution (<2.5 Å) antibody structures from SAbDab” in the revised manuscript. The sequence redundancy applies to the whole mAb/Nb.

      The description of ablation studies is not easy to follow. For example, what does removing TGM mean in practical terms (e.g. only AF2 is used, or PSPM is applied if AF2 score < 80)? Similarly, what does removing CBM mean in practical terms (e.g. all AF2 models are optimized by PSPM, and no grafting is done)? Thanks for your comments and suggestions. We have corrected “d, Differences in H3-OPT accuracy without the template module. e, Differences in H3-OPT accuracy without the CBM. f, Differences in H3-OPT accuracy without the TGM.” into “d, Differences in H3-OPT accuracy without the template module. This ablation study means only PSPM is used. e, Differences in H3-OPT accuracy without the CBM. This ablation study means input loop is optimized by TGM and PSPM. f, Differences in H3-OPT accuracy without the TGM. This ablation study means input loop is optimized by CBM and PSPM.”.

      Authors should report the values in the text using the same statistical descriptor that is used in the figures to help the analysis by the reader. For example, in lines 223-224 a precision score of 0.75 for H3-OPT is reported in the text (I assume this is the average value), while the median of ~0.85 is shown in Fig. 6a.

      Thank you for your careful checks. We have corrected “After identifying the contact residues of antigens by H3-OPT, we found that H3-OPT could substantially outperform AF2 (Fig. 6a), with a precision of 0.75 and accuracy of 0.94 compared to 0.66 precision and 0.92 accuracy of AF2.” into “After identifying the contact residues of antigens by H3-OPT, we found that H3-OPT could substantially outperform AF2 (Fig. 6a), with a median precision of 0.83 and accuracy of 0.97 compared to 0.64 precision and 0.95 accuracy of AF2.” in proper place of manuscript.

      Minor corrections

      Lines 91-94: What do length values mean? e.g. is 0-2 Å the RMSD from the experimental structure?

      We appreciate your comment and apologize for any confusion. The RMSD value is actually from experimental structure. The RMSD value evaluates the deviation of predicted CDR-H3 loop from native structure and also represents the degree of prediction difficulty in AlphaFold2 predictions. We have added following sentence in the proper place of the revised manuscript: “(RMSD, a measure of the difference between the predicted structure and an experimental or reference structure)”.

      Line 120: is the "AF2 confidence score" for the full-length or CDR-H3?

      We gratefully appreciate for your valuable comment and have corrected “Interestingly, we observed that AF2 confidence score shared a strong negative correlation with Cα-RMSDs (Pearson correlation coefficient =-0.67 (Fig. 2b)” into “Interestingly, we observed that AF2 confidence score of CDR-H3 shared a strong negative correlation with Cα-RMSDs (Pearson correlation coefficient =-0.67 (Fig. 2b)” in the revised manuscript.

      Line 166: Do authors mean "Taken" instead of "Token"?

      We are really sorry for our careless mistakes. Thank you for your reminder.

      Line 258: Reference to Fig. 1 seems wrong, do authors mean Fig. 4?

      We sincerely thank the reviewer for careful reading. As suggested by the reviewer, we have corrected the “Fig. 1” into “Fig. 4”.

      Author response image 7.

      Point out which plot corresponds to AF2 and which one to H3-OPT

      Thanks for pointing out this issue. We have added the legends of this figure in the proper positions in our manuscript.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript aimed at elucidating the substrate specificity of two M23 endopeptidase Lysostaphin (LSS) and LytM in S. aureus. Endopeptidases are known to cleave the glycine-bridges of staphylococcal cell wall peptidoglycan (PG). To address this question, various glycine-bridge peptides were synthesized as substrates, the catalytic domain of LSS and LytM were recombinantly expressed and purified, and the reactions were analyzed using solution-state NMR. The major finding is that LytM is not only a Gly-Gly endopeptidase, but also cleaves D-Ala-Gly. Technically, the advantage of using real-time NMR was emphasized in the manuscript. The study explores an interesting aspect of cell wall hydrolases in terms of substrate-level regulation. It potentially identified new enzymatic activity of LytM. However, the biological significance and relevance of the conclusions remain clear, as the results are mostly from synthetic substrates.

      Strengths:

      The study explores an interesting aspect of cell wall hydrolases in terms of substrate-level regulation. It potentially identified new enzymatic activity of LytM.

      Weaknesses:

      1) Significance: while the current study provided a detailed analysis of various substrates, the conclusions are mainly based on synthesized peptides. One experiment used purified muropeptides (Fig. 3H); however, the results were unclear from this figure.

      We acknowledge the Reviewer for comments and concerns regarding the potential weaknesses of this study.

      Because peptidoglycan is insoluble, as such it is not amenable to solution-state NMR studies. However, soluble peptidoglycan (PG) fragments for NMR analyses can be obtained by digesting bacterial sacculi or via chemical synthesis. Whereas digestion results in mixtures of products, synthesis yields pure molecules. Analysis of NMR spectra of muropeptide-mimicking synthetic peptides before and after enzyme addition provides tools to identify peaks in the much more complex spectra of mutanolysin-treated sacculus.

      We will improve data presentation in Figure 3H in the revised version of our manuscript and emphasize the similarity of product peaks in spectra acquired from experiments using either synthetic peptides or mutanolysin-digested sacculus.

      The results from synthesized peptides may not necessarily correlate with their biological functions in vivo.

      The Reviewer refers several times to the use of synthetic peptides in this study. While it is unclear to us whether the concern is about the synthetic nature of the molecules or because the peptides are devoid of PG disaccharide units, it is true that PG fragments lack the 3D architecture present in intact sacculus, and thus cannot perfectly mimic the in vivo milieu. The fragments, as well as purified sacculus, also lack all other components present in an intact bacterial cell wall. Our largest synthetic peptide (7), however, represents a crosslinked muropeptide (stem-pentaGly-stem) which according to the structural model recently presented by Razew et al. (2023) (Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706) is large enough to cover the peptidic interaction interface between substrate and enzyme.

      Secondly, the study used only the catalytic domain of both proteins. It is known that the substrate specificity of these enzymes is regulated by their substrate-binding domains. There is no mention of other domains in the manuscript and no justification of why only the catalytic domain was studied. In short, the relevance of the results from the current study to the enzymes' actual physiological functions remains to be addressed, which attenuated the significance of the study.

      Lysostaphin catalytic domain was used for experimental simplicity and to allow direct comparison with LytM catalytic domain. Because lysostaphin cell-wall targeting (SH3b) domain interacts with the substrate with variable affinities depending on the substrate structure (Tossavainen et al., Structural and functional insights into lysostaphin-substrate interaction, Front. Mol. Biosci. 5, 60 (2018) and Gonzalez-Delgado et al., Two-site recognition of Staphylococcus aureus peptidoglycan by lysostaphin SH3b, Nat. Chem. Biol. 16, 24-30 (2020)), we would have had skewed results on kinetics because of this interaction.

      Catalytic domains were used also in the article by Razew et al. (Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706 (2023)). They showed that mature lysostaphin and lysostaphin catalytic domain hydrolysed the same Gly-Gly bonds.

      Moreover, full-length LytM is catalytically inactive. This is because the linker between its N-terminal and catalytic domains occludes the catalytic site (Odintsov et al. Latent LytM at 1.3 Å resolution. J. Mol. Biol. 225, 775 (2004)). LytM catalytic domain without its N-terminal segment is active (Odintsov et al (2004) and Firczuk et al. Crystal structure of active LytM. J. Mol. Biol 354, 578 (2005)).

      2) Impact and novelty:

      (1) the current study provided evidence suggesting the novel function of LytM in cleaving D-Ala-Gly. The impact of this finding is unclear. The manuscript discussed Enterococcus faecalis EnpA. But how about other M23 endopeptidases? What is biological relevance?

      EnpA was specifically mentioned because it has been reported to also cleave the D-Ala-Gly bond. Structural similarities between the enzymes could reveal the basis for this bond specificity. Moreover, the focus of the study was not to reveal the biological function of LytM but rather to understand which amino acid substitutions lead to differences in specificities in the two structurally very similar enzymes.

      (2) A very similar study published recently showed that the activity of LSS and LytM is regulated by PG cross-linking: LSS cleaves more cross-linked PG and LytM cleaves less cross-linked PG (Razew, A., Laguri, C., Vallet, A., et al. Staphylococcus aureus sacculus mediates activities of M23 hydrolases. Nat Commun 14, 6706 (2023). The results of this paper are different from the current study whereby both LSS and LytM prefer cross-linked substrates (Fig, 2JKL). Moreover, no D-Ala-Gly cleavage was observed by LytM using purified PG substrate from Razew A et al. An explanation of inconsistent results is needed here. In my opinion, the knowledge generated from the current study has not been fully settled. If the results can be validated, the contribution to the field is incremental, but not substantial.

      Another point raised by the Reviewer concerned the inconsistent results between our study and the recent paper by Razew et al. (2023) regarding LytM D-Ala--Gly cleavage. The explanation might lie in the type of NMR data acquired and its interpretation. We identified all hydrolysis products using 1H, 13C multiple bond correlation NMR spectra acquired from samples dissolved in deuterated buffers. Use of C-H signals is advantageous in that they are not prone to chemical exchange phenomena and enable unambiguous chemical shift assignment. Based on shown NMR spectra, Razew and co-workers identified cleaved muropeptide bonds by observing product glycine peaks in 1H, 15N correlation spectra, specifically amide peaks of product C-terminal glycines appearing in the 114-117 ppm 15N region of spectra of samples treated with LytM/LSS. D-Ala--Gly cleavage, however produces an N-terminal glycine, whose signal due to chemical exchange is not typically observed in regular N,H correlation spectra. Razew and co-workers validated their observations with UPLC-MS analysis. However, to our understanding, their data analysis was based on the assumption that LytM cleaves between Gly4-Gly5 (or Gly1-Gly2 using our numbering), and accordingly only masses corresponding to potential products containing 1 to 4 glycines anchored to the lysine side chain were considered.

      (3) The authors emphasized a few times in the text that it is superior to use NMR technology. In my opinion, NMR has certain advantages, such as measuring the efficacy of cleavage, but it is not that superior. It should be complementary to other methods such as mass spectrometry. In addition, more relevant solid-state NMR using intact PG or bacterial cells was not discussed in the study. I am of the opinion that the corresponding text should be revised.

      We value and agree with the Reviewer’s opinion that NMR spectroscopy is complementary to other methods e.g., mass spectrometry. However, in this particular case, NMR provided simultaneously information on reaction kinetics as well as scissile bonds in the substrates, which allowed us to compare rates of hydrolysis in different PG fragments and reshape the substrate specificities of LytM/LSS. We also agree that solid-state NMR is a wonderful technique. In our revised manuscript, we will edit the text accordingly.

      3) The conclusions are not fully supported by the data

      As mentioned above, the conclusions from synthesized peptide substrates may not necessarily reveal physiological functions. The conclusions need to be validated by more physiological substrates.

      As pointed out above in our response to the potential weaknesses of this study, the aim of this work was not to reveal the physiological function of LytM but to glean information on its substrate specificity that echoes its functional role in a substrate level. Hitherto LytM has been shown to cleave amide bonds between glycines without providing detailed information about the specific scissile bonds in the established PG components in S. aureus cell wall. The same holds true for lysostaphin as well. This study provides concomitantly information on the rates of hydrolysis and scissile bonds of these two enzymes. We deduced that LytM, and especially lysostaphin substrate specificity is defined by D-Ala-Gly cross-linking, which is a structural property, whereas Razew et al. (2023) discuss about “more cross-linked” and “less cross-linked PG”, which is a supramolecular asset or density.

      4) There are some issues with the presentation of the figures, text, and formatting.

      We are grateful to the Reviewer for bringing up issues in figures and text. We will address these in the revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This work investigates the enzymatic properties of lysostaphin (LSS) and LytM, two enzymes produced by Staphylococcus aureus and previously described as glycyl-glycyl endopeptidases. The authors use synthetic peptide substrates mimicking peptidoglycan fragments to determine the substrate specificity of both enzymes and identify the bonds they cleave.

      Strengths:

      • This work is addressing a real gap in our knowledge since very little information is available about the substrate specificity of peptidoglycan hydrolases.

      • The experimental strategy and its implementation are robust and provide a thorough analysis of LSS and LytM enzymatic activities. The results are very convincing and demonstrate that the enzymatic properties of the model enzymes studied need to be revisited.

      Weaknesses:

      • The manuscript is difficult to read in places and some figures are not always presented in a way that is easy to follow. This being said, the authors have made a good effort to present their experiments in an engaging manner. Some recommendations have been made to improve the current manuscript but these remain minor issues.

      We thank the Reviewer for providing positive feedback on our manuscript and for appreciating the systematic work behind this study which aims to unknot the substrate specificity of two S. aureus PG hydrolyzing enzymes. We are grateful for the comments aiming to improve the presentation of the current version of manuscript and we will take these into account while preparing the revised version of the manuscript.

    1. Author Response

      We would like to thank the senior editor, reviewing editor and all the reviewers for taking out precious time to review our manuscript and appreciating our study. We are excited that all of you have found strength in our work and have provided comments to strengthen it further. We sincerely appreciate the valuable comments and suggestions, which we believe will help us to further improve the quality of our work.

      Reviewer 1

      The manuscript by Dubey et al. examines the function of the acetyltransferase Tip60. The authors show that (auto)acetylation of a lysine residue in Tip60 is important for its nuclear localization and liquid-liquid-phase-separation (LLPS). The main observations are: (i) Tip60 is localized to the nucleus, where it typically forms punctate foci. (ii) An intrinsically disordered region (IDR) within Tip60 is critical for the normal distribution of Tip60. (iii) Within the IDR the authors show that a lysine residue (K187), that is auto-acetylated, is critical. Mutation of that lysine residue to a non-acetylable arginine abolishes the behavior. (iv) biochemical experiments show that the formation of the punctate foci may be consistent with LLPS.

      On balance, this is an interesting study that describes the role of acetylation of Tip60 in controlling its biochemical behavior as well as its localization and function in cells. The authors mention in their Discussion section other examples showing that acetylation can change the behavior of proteins with respect to LLPS; depending on the specific context, acetylation can promote (as here for Tip60) or impair LLPS.

      Strengths:

      The experiments are largely convincing and appear to be well executed.

      Weaknesses:

      The main concern I have is that all in vivo (i.e. in cells) experiments are done with overexpression in Cos-1 cells, in the presence of the endogenous protein. No attempt is made to use e.g. cells that would be KO for Tip60 in order to have a cleaner system or to look at the endogenous protein. It would be reassuring to know that what the authors observe with highly overexpressed proteins also takes place with endogenous proteins.

      Response: The main reason to perform these experiments with overexpression system was to generate different point mutants and deletion mutants of TIP60 and analyse their effect on its properties and functions. To validate our observations with overexpression system, we also examined localization pattern of endogenous TIP60 by IFA and results depict similar kind of foci pattern within the nucleus as observed with overexpressed TIP60 protein (Figure 4A). However, we understand the reviewers concern and agree to repeat some of the overexpression experiments under endogenous TIP60 knockdown conditions using siRNA or shRNA against 3’ UTR region.

      Also, it is not clear how often the experiments have been repeated and additional quantifications (e.g. of western blots) would be useful.

      Response: The experiments were performed as independent biological replicates (n=3) and this is mentioned in the figure legends. Regarding the suggestion for quantifying Western blots, we want to bring into the notice that where ever required (for blots such as Figure 2F, 6H) that require quantitative estimation, graph representing quantitated value with p-value had already been added. However as suggested, in addition, quantitation for Figure 6D will be performed and added in the revised version.

      In addition, regarding the LLPS description (Figure 1), it would be important to show the wetting behaviour and the temperature-dependent reversibility of the droplet formation.

      Response: We appreciate the suggestion, and we will perform these assays and include the results in the revised version.

      In Fig 3C the mutant (K187R) Tip60 is cytoplasmic, but still appears to form foci. Is this still reflecting phase separation, or some form of aggregation?

      Response: TIP60 (K187R) mutant remains cytosolic with homogenous distribution as shown in Figure 2E. Also with TIP60 partners like PXR or p53, this mutant protein remains homogenously distributed in the cytosol. However, when co-expressed with TIP60 (Wild-type) protein, this mutant protein although still remain cytosolic some foci-like pattern is also observed at the nuclear periphery which we believe could be accumulated aggregates.

      Reviewer 2

      The manuscript "Autoacetylation-mediated phase separation of TIP60 is critical for its functions" by Dubey S. et al reported that the acetyltransferase TIP60 undergoes phase separation in vitro and cell nuclei. The intrinsically disordered region (IDR) of TIP60, particularly K187 within the IDR, is critical for phase separation and nuclear import. The authors showed that K187 is autoacetylated, which is important for TIP60 nuclear localization and activity on histone H4. The authors did several experiments to examine the function of K187R mutants including chromatin binding, oligomerization, phase separation, and nuclear foci formation. However, the physiological relevance of these experiments is not clear since TIP60 K187R mutants do not get into nuclei. The authors also functionally tested the cancer-derived R188P mutant, which mimics K187R in nuclear localization, disruption of wound healing, and DNA damage repair. However, similar to K187R, the R188P mutant is also deficient in nuclear import, and therefore, its defects cannot be directly attributed to the disruption of the phase separation property of TIP60. The main deficiency of the manuscript is the lack of support for the conclusion that "autoacetylation-mediated phase separation of TIP60 is critical for its functions".

      This study offers some intriguing observations. However, the evidence supporting the primary conclusion, specifically regarding the necessity of the intrinsically disordered region (IDR) and K187ac of TIP60 for its phase separation and function in cells, lacks sufficient support and warrants more scrutiny. Additionally, certain aspects of the experimental design are perplexing and lack controls to exclude alternative interpretations. The manuscript can benefit from additional editing and proofreading to improve clarity.

      Response: We understand the point raised by the reviewer, however we would like to draw his attention to the data where we clearly demonstrated that acetylation of lysine 187 within the IDR of TIP60 is required for its phase separation (Figure 2J). We would like to draw reviewer’s attention to other TIP60 mutants within IDR (R177H, R188H, K189R) which all enters the nucleus and make phase separated foci. Cancer-associated mutation at R188 behaves similarly because it also hampers TIP60 acetylation at the adjacent K187 residue. Our in vitro and in cellulo results clearly demonstrate that autoacetylation of TIP60 at K187 within its IDR is critical for multiple functions including its translocation inside the nucleus, its protein-protein interaction and oligomerization which are prerequisite for phase separation of TIP60.

      There are two putative NLS sequences (NLS #1 from aa145; NLS #2 from aa184) in TIP60, both of which are within the IDR. Deletion of the whole IDR is therefore expected to abolish the nuclear localization of TIP60. Since K187 is within NLS #2, the cytoplasmic localization of the IDR and K187R mutants may not be related to the ability of TIP60 to phase separation.

      Response: We are not disputing the presence of putative NLS within IDR region of TIP60, however our results through different mutations within IDR region (K76, K80, K148, K150, R177, R178, R188, K189) clearly demonstrate that only K187 residue acetylation is critical to shuttle TIP60 inside the nucleus while all other lysine mutants located within these putative NLS region exhibited no impact on TIP60’s nuclear shuttling. We have mentioned this in our discussion, that autoacetylation of TIP60’s K187 may induce local structural modifications in its IDR which is critical for translocating TIP60 inside the nucleus where it undergoes phase separation critical for its functions. A previous example of similar kind shows, acetylation of lysine within the NLS region of TyrRS by PCAF promote its nuclear localization (Cao X et al 2017, PNAS). IDR region (which also contains K187 site) is important for phase separation once the protein enters inside the nucleus. This could be the cell’s mechanism to prevent unwarranted action of TIP60 until it enters the nucleus and phase separate on chromatin at appropriate locations.

      The chromatin-binding activity of TIP60 depends on HAT activity, but not phase-separation (Fig 1I), (Fig 2B). How do the authors reconcile the fact that the K187R mutant is able to bind to chromatin with lower activity than the HAT mutant (Fig 2F, 2I)?

      Response: K187 acetylation is required for TIP60’s nuclear translocation but not critical for chromatin binding. When soluble fraction is prepared in fractionation experiment, nuclear membrane is disrupted and TIP60 (K187R) mutant has no longer hindrance in accessing the chromatin and thus can load on the chromatin (although not as efficient as Wild-type protein). For efficient chromatin binding auto-acetylation of other lysine residues in TIP60 is required which might be hampered due to reduced catalytic activity or not sufficient enough to maintain equilibrium with HDAC’s activity inside the nucleus. In case of K187R, the reduced auto-acetylation is captured when protein is the cytosol. During fractionation, once this mutant has access to chromatin, it might auto-acetylate other lysine residues critical for chromatin loading (remember catalytic domain is intact in this mutant). This is evident due to hyper auto-acetylation of Wild-type protein compared to K187R or HAT mutant proteins. We want to bring into notice that phase-separation occurs only after efficient chromatin loading of TIP60 that is the reason that under in-cellulo conditions, both K187R (which cannot enter the nucleus) and HAT mutant (which enters the nucleus but fails to efficiently binds onto the chromatin) fails to form phase separated nuclear punctate foci.

      The DIC images of phase separation in Fig 2I need to be improved. The image for K187R showed the irregular shape of the condensates, which suggests particles in solution or on the slide. The authors may need to use fluorescent-tagged TIP60 in the in vitro LLPS experiments.

      Response: We believe this comment is for figure 2J. The irregularly shaped condensates observed for TIP60 K187R are unique to the mutant protein and are not caused by particles on the slide. We would like to draw reviewer’s attention to supplementary figure S2A, where DIC images for TIP60 (Wild-type) protein tested under different protein and PEG8000 conditions are completely clear where protein did not made phase separated droplets ruling out the probability of particles in solution or slides.

      The authors mentioned that the HAT mutant of TIP60 does not phase separate, which needs to be included.

      Response: We have already added the image of RFP-TIP60 (HAT mutant) in supplementary Fig S4A (panel 2) in the manuscript.

      Related to Point 3, the HAT mutant that doesn't form punctate foci by itself, can incorporate into WT TIP60 (Fig 5A). In vitro LLPS assay for WT, HAT, and K187R mutants with or without acetylation should be included. WT and mutant TIP can be labelled with GFP and RFP, respectively.

      Response: We would like to draw reviewer’s attention towards our co-expression experiments performed in Figure 5 where Wild-type protein (both tagged and untagged condition) is able to phase separate and make punctate foci with co-expressed HAT mutant protein (with depleted autoacetylation capacity). We believe these in cellulo experiments are already able to answer the queries what reviewer is suggesting to acheive by in vitro experiments.

      Fig 3A and 3B showed that neither K187 mutant nor HAT mutant could oligomerize. If both experiments were conducted in the absence of in vitro acetylation, how do the authors reconcile these results?

      Response: We thank the reviewer for highlighting our oversight in omitting the mention of acetyl coenzyme A here. To induce acetylation under in vitro conditions, we have added 10 µM acetyl CoA into the reactions depicted in Figure 3A and 3B. The information for acetyl CoA for Figure 3B was already included in the GST-pull down assay (material and methods section). We will add the same in the oligomerization assay of material and methods in the revised manuscript.

      In Fig 4, the colocalization images showed little overlap between TIP60 and nuclear speckle (NS) marker SC35, indicating that the majority of TIP60 localized in the nuclear structure other than NS. Have the authors tried to perturbate the NS by depleting the NS scaffold protein and examining TIP60 foci formation? Do PXR and TP53 localize to NS?

      Response: Under normal conditions majority of TIP60 is not localized in nuclear speckles (NS) so we believe that perturbing NS will not have significant effect on TIP60 foci formation. Interestingly, recently a study by Shelly Burger group (Alexander KA et al Mol Cell. 2021 15;81(8):1666-1681) had shown that p53 localizes to NS to regulate subset of its targeted genes. We have mentioned about it in our discussion section. No information is available about localization of PXR in NS.

      Were TIP60 substrates, H4 (or NCP), PXR, TP53, present inTIP60 condensates in vitro? It's interesting to see both PXR and TP53 had homogenous nuclear signals when expressed together with K187R, R188P (Fig 6E, 6G), or HAT (Suppl Fig S4A) mutants. Are PXR or TP53 nuclear foci dependent on their acetylation by TIP60? This can and should be tested.

      Response: Both p53 and PXR are known to be acetylated by TIP60. In case of PXR, TIP60 acetylate PXR at lysine 170 and this TIP60-mediated acetylation of PXR at K170 is important for TIP60-PXR foci which now we know are formed by phase separation (Bakshi K et al Sci Rep. 2017 Jun 16;7(1):3635).

      Since R188P mutant, like K187R, does not get into the nuclei, it is not suitable to use this mutant to examine the functional relevance of phase separation for TIP60. The authors need to find another mutant in IDR that retains nuclear localization and overall HAT activity but specifically disrupts phase separation. Otherwise, the conclusion needs to be restated. All cancer-derived mutants need to be tested for LLPS in vitro.

      Response: We appreciate the reviewer’s point here, but it is important to note that the objective of these experiments is to understand the impact of K187R (critical in multiple aspects of TIP60 including phase separation) and R188P (a naturally occurring cancer-associated mutation and behaving similarly to K187R) on TIP60’s activities to determine their functional relevance. As suggested by the reviewer to test and find IDR mutant that fails to phase separate however retains nuclear localization and catalytic activity can be examined in future studies.

      For all cellular experiments, it is not mentioned whether endogenous TIP60 was removed and absent in the cell lines used in this study. It's important to clarify this point because the localization and function of mutant TIP60 are affected by WT TIP60 (Fig 5).

      Response: Endogenous TIP60 was present in in cellulo experiments, however as suggested by reviewer 1 we will perform some of the in cellulo experiments under endogenous TIP60 knockdown condition to validate our findings.

      It is troubling that H4 peptide is used for in vitro HAT assay since TIP60 has much higher activity on nucleosomes and its preferred substrates include H2A.

      Response: The purpose of using H4 peptide in the HAT assay is to determine the impact of mutations of TIP60’s catalytic activity. As H4 is one of the major histone substrate for TIP60, we believe it satisfy the objective of experiments.

      Reviewer 3

      This study presents results arguing that the mammalian acetyltransferase Tip60/KAT5 auto-acetylates itself on one specific lysine residue before the MYST domain, which in turn favors not only nuclear localization but also condensate formation on chromatin through LLPS. The authors further argue that this modification is responsible for the bulk of Tip60 autoacetylation and acetyltransferase activity towards histone H4. Finally, they suggest that it is required for association with txn factors and in vivo function in gene regulation and DNA damage response.

      These are very wide and important claims and, while some results are interesting and intriguing, there is not really close to enough work performed/data presented to support them. In addition, some results are redundant between them, lack consistency in the mutants analyzed, and show contradiction between them. The most important shortcoming of the study is the fact that every single experiment in cells was done in over-expressed conditions, from transiently transfected cells. It is well known that these conditions can lead to non-specific mass effects, cellular localization not reflecting native conditions, and disruption of native interactome. On that topic, it is quite striking that the authors completely ignore the fact that Tip60 is exclusively found as part of a stable large multi-subunit complex in vivo, with more than 15 different proteins. Thus, arguing for a single residue acetylation regulating condensate formation and most Tip60 functions while ignoring native conditions (and the fact that Tip60 cannot function outside its native complex) does not allow me to support this study.

      Response: We appreciate the reviewer’s point here, but it is important to note that the main purpose to use overexpression system in the study is to analyse the effect of different generated point/deletion mutations on TIP60. We have overexpressed proteins with different tags (GFP or RFP) or without tags (Figure 3C, Figure 5) to confirm the behaviour of protein which remains unperturbed due to presence of tags. To validate we have also examined localization of endogenous TIP60 protein which also depict similar localization behaviour as overexpressed protein. We would like to draw attention that there are several reports in literature where similar kind of overexpression system are used to determine functions of TIP60 and its mutants. Also nuclear foci pattern observed for TIP60 in our studies is also reported by several other groups.

      Sun, Y., et. al. (2005) A role for the Tip60 histone acetyltransferase in the acetylation and activation of ATM. Proc Natl Acad Sci U S A, 102(37):13182-7.

      Kim, C.-H. et al. (2015) ‘The chromodomain-containing histone acetyltransferase TIP60 acts as a code reader, recognizing the epigenetic codes for initiating transcription’, Bioscience, Biotechnology, and Biochemistry, 79(4), pp. 532–538.

      Wee, C. L. et al. (2014) ‘Nuclear Arc Interacts with the Histone Acetyltransferase Tip60 to Modify H4K12 Acetylation(1,2,3).’, eNeuro, 1(1). doi: 10.1523/ENEURO.0019-14.2014.

      However, as a caution and suggested by other reviewers also we will perform some of these overexpression experiments in absence of endogenous TIP60 by using 3’ UTR specific siRNA/shRNA.

      We thank the reviewer for his comment on muti-subunit complex proteins and we would like to expand our study by determining the interaction of some of the complex subunits with TIP60 ((Wild-type) that forms nuclear condensates), TIP60 ((HAT mutant) that enters the nucleus but do not form condensates) and TIP60 ((K187R) that do not enter the nucleus and do not form condensates). We will include the result of these experiments in the revised manuscript.

      • It is known that over-expression after transient transfection can lead to non-specific acetylation of lysines on the proteins, likely in part to protect from proteasome-mediated degradation. It is not clear whether the Kac sites targeted in the experiments are based on published/public data. In that sense, it is surprising that the K327R mutant does not behave like a HAT-dead mutant (which is what exactly?) or the K187R mutant as this site needs to be auto-acetylated to free the catalytic pocket, so essential for acetyltransferase activity like in all MYST-family HATs. In addition, the effect of K187R on the total acetyl-lysine signal of Tip60 is very surprising as this site does not seem to be a dominant one in public databases.

      Response: We have chosen autoacetylation sites based on previously published studies where LC-MS/MS and in vitro acetylation assays were used to identified autoacetylation sites in TIP60 which includes K187. We have already mentioned about it in the manuscript and have quoted the references (1. Yang, C., et al (2012). Function of the active site lysine autoacetylation in Tip60 catalysis. PloS one 7, e32886. 10.1371/journal.pone.0032886. 2. Yi, J., et al (2014). Regulation of histone acetyltransferase TIP60 function by histone deacetylase 3. The Journal of biological chemistry 289, 33878–33886. 10.1074/jbc.M114.575266.). We would like to emphasize that both these studies have identified K187 as autoacetylation site in TIP60. Since TIP60 HAT mutant (with significantly reduced catalytic activity) can also enter nucleus, it is not surprising that K327 could also enter the nucleus.

      • As the physiological relevance of the results is not clear, the mutants need to be analyzed at the native level of expression to study real functional effects on transcription and localization (ChIP/IF). It is not clear the claim that Tip60 forms nuclear foci/punctate signals at physiological levels is based on what. This is certainly debated because in part of the poor choice of antibodies available for IF analysis. In that sense, it is not clear which Ab is used in the Westerns. Endogenous Tip60 is known to be expressed in multiple isoforms from splice variants, the most dominant one being isoform 2 (PLIP) which lacks a big part (aa96-147) of the so-called IDR domain presented in the study. Does this major isoform behave the same?

      Response: TIP60 antibody used in the study is from Santa Cruz (Cat. No.- sc-166323). This antibody is widely used for TIP60 detection by several methods and has been cited in numerous publications. Cat. No. will be mentioned in the manuscript. Regarding isoforms, three isoforms are known for TIP60 among which isoform 2 is majorly expressed and used in our study. Isoform and 1 and 2 have same length of IDR (150 amino acids) while isoform 3 has IDR of 97 amino acids. Interestingly, the K187 is present in all the isoforms (already mentioned in the manuscript) and missing region (96-147 amino acid) in isoform 3 has less propensity for disordered region (marked in blue circle). This clearly shows that all the isoforms of TIP60 has the tendency to phase separate.

      Author response image 1.

      • It is extremely strange to show that the K187R mutant fails to get in the nuclei by cell imaging but remains chromatin-bound by fractionation... If K187 is auto-acetylated and required to enter the nucleus, why would a HAT-dead mutant not behave the same?

      Response: We would like to draw attention that both HAT mutant and K187R mutant are not completely catalytically dead. As our data shows both these mutants have catalytic activity although at significantly decreased levels. We believe that K187 acetylation is critical for TIP60 to enter the nucleus and once TIP60 shuttles inside the nucleus autoacetylation of other sites is required for efficient chromatin binding of TIP60. In fractionation assay, nuclear membrane is dissolved while preparing the soluble fraction so there is no hindrance for K187R mutant in accessing the chromatin. While in the case of HAT mutant, it can acetylate the K187 site and thus is able to enter the nucleus however this residual catalytic activity is either not able to autoacetylate other residues required for its efficient chromatin binding or to counter activities of HDAC’s deacetylating the TIP60.

      • If K187 acetylation is key to Tip60 function, it would be most logical (and classical) to test a K187Q acetyl-mimic substitution. In that sense, what happens with the R188Q mutant? That all goes back to the fact that this cluster of basic residues looks quite like an NLS.

      Response: As suggested we will generate acetylation mimicking mutant for K187 site and examine it. Result will be added in the revised manuscript.

      • The effect of the mutant on the TIP60 complex itself needs to be analyzed, e.g. for associated subunits like p400, ING3, TRRAP, Brd8...

      Response: As suggested we will examine the effect of mutations on TIP60 complex

    1. Author Response

      Reviewer #1 (Public Review):

      “A sample size of 3 idiopathic seems underpowered relative to the many types of genetic changes that can occur in ASD. Since the authors carried out WGS, it would be useful to know what potential causative variants were found in these 3 individuals and even if not overlapping if they might expect to be in a similar biological pathway.

      If the authors randomly selected 3 more idiopathic cell lines from individuals with autism, would these cell lines also have altered mTOR signaling? And could a line have the same cell biology defects without a change in mTOR signaling? The authors argue that the sample size could be the reason for lack of overlap of the proteomic changes (unlike the phosphor-proteomic overlaps), which makes the overlapping cell biology findings even more remarkable. Or is the phenotyping simply too crude to know if the phenotypes truly are the same?”

      We appreciate these thoughtful comments and also agree that of several models, our studies indicate the possibility of mTOR alteration in multiple forms of ASD. As above, we are currently pursuing this hypothesis with newly acquired DOD support. With regard to the I-ASD population, we agree that there are a large variety of genetic changes that can occur in genetically undefined ASDs. Indeed, this is precisely why we expected to see “personalized” phenotypes in each I-ASD individual when we embarked on this study. At that time, several years ago, we had planned to expand the analyses to more I-ASD individuals to assess for additional personalized phenotypes. However, as our studies progressed, we were surprised to find convergence in our I-ASD population in terms of neurite outgrowth and migration and later proteomic results showing convergence in mTOR. We found it particularly remarkable that despite a sample size of 3 that this convergence was noted. When we had the opportunity to extend our studies to the 16p11.2 deletion population, we were thrilled to conduct the first comparison between I-ASD and a genetically defined ASD and, as such, the scope of the paper turned towards this comparison. We do agree that analyses of the other I-ASD individuals would be a beneficial endeavor, both to understand how pervasive NPC migration and neurite deficits are in autism and to assess the presence of mTOR dysregulation. Furthermore, it would be important to see whether alterations in other pathways could also lead to similar cell biological deficits, though we know that other studies of neurodevelopmental disorders have found such cellular dysregulations without reporting concurrent mTOR dysregulation. Given our current grant funding to extend these analyses, such experiments within this manuscript would not be feasible.

      Regarding the phenotyping methods used, we decided to assess neurite outgrowth and migration as they are both cytoskeleton dependent processes that are critical for neurodevelopment and are often regulated by the same genes. Furthermore, similar analyses have been applied to Fragile-X Syndrome, 22q11.2 deletion syndrome, and schizophrenia NPCs (Shcheglovitov A. et al., 2013; Mor-Shaked H. et al., 2016; Urbach A. et al., 2010; Kelley D. J. et al., 2008; Doers M. E. et al., 2014; Brennand K. et al., 2015; Lee I. S. et al., 2015; Marchetto M. C. et al., 2011). As such, it seems that multiple underlying etiologies can lead to similar dysregulated cellular phenotypes that can contribute to a variety of neurodevelopmental disorders. On a more global level, there are only a few different cellular functions a developing neuron can undergo, and these include processes such as proliferation, survival, migration, and differentiation. Thus, to understand neurodevelopmental disorders, it is important to study the more “crude” or “global” cellular functions occurring during neurodevelopment to determine whether they are disrupted in disorders such as ASD. In our studies we find that there are indeed dysregulations in many of these basic developmental processes, indicating that the typical steps that occur for normal brain cytoarchitecture may be disrupted in ASD. To understand why, we then further utilized molecular studies to “zoom” in on potential mechanisms which implicated common dysregulation in mTOR signaling as one driver for these common cellular phenotypes. As suggested, we did complete WGS on all the I-ASD individuals and did not see any overlapping genetic variants between the three I-ASD individuals as mentioned in our manuscript. The genetic data was published in a larger manuscript incorporating the data (Zhou A. et al., 2023). However, there were variants that were unique to each I-ASD individual which were not seen in their unaffected family members, and it is possible these variants could be contributing to the I-ASD phenotypes. We also utilized IPA to conduct pathway analysis on the WGS data utilizing the same approach we did in analysis of p- proteome and proteome data. From WGS data, we selected high read-quality variants that were found only in I-ASD individuals and had a functional impact on protein (ie excluding synonymous variants). The enriched pathways obtained from this data were strikingly different from the pathways we found in the p-proteome analysis and are now included in supplemental Figure 6 in the manuscript. Briefly, the top 5 enriched pathways were: O-linked glycosylation, MHC class 1 signaling, Interleukin signaling, Antigen presentation, and regulation of transcription.

      Reviewer #2 (Public Review):

      1) I found that interpreting how differential EF sensitivity is connected to the rest of the story difficult at times. First, it is unclear why these extracellular factors were picked. These are seemingly different in nature (a neuropeptide, a growth factor and a neuromodulator) targeting largely different pathways. This limits the interpretation of the ASD subtype-specific rescue results. One way of reframing that could help is that these are pro-migratory factors instead of EFs broadly defined that fail to promote migration in I-ASD lines due to a shared malfunctioning of the intracellular migration machinery or cell-cell interactions (possibly through tight junction signaling, Fig S2A). Yet, this doesn't explain the migration/neurite phenotypes in 16p11 lines where EF sensitivity is not altered, overall implying that divergent EF sensitivity independent of underlying mTOR state. What is the proposed model that connects all three findings (divergent EF sensitivity based on ASD subtypes, 2 mTOR classes, convergent cellular phenotypes)?

      We thank you for the kind assessment of our manuscript and for the thought-provoking questions posed. In terms of extracellular factors, for our study, we defined extracellular factor as any growth factor, amino acid, neurotransmitter, or neuropeptide found in the extracellular environment of the developing cells. The EFs utilized were selected due to their well-established role in regulation of early neurodevelopmental phenotypes, their expression during the “critical window” of mid-fetal development (as determined by Allan Brain Atlas), and in the case of 5-HT, its association with ASD (Abdulamir H. A. et al., 2018; Adamsen D. et al., 2014; Bonnin A. et al., 2011; Bonnin A. et al., 2007; Chen X. et al., 2015; El Marroun H. et al., 2014; Hammock E. et al., 2012; Yang C. J. et al., 2014; Dicicco-Bloom E. et al., 1998; Lu N. et al., 1998; Suh J. et al., 2001; Watanabe J. et al., 2016; Gilmore J. H. et al., 2003; Maisonpierre P. C. et al., 1990; Dincel N. et al., 2013; Levi- Montalcini R., 1987). Lastly, prior experiments in our lab with a mouse model of neurodevelopmental disorders, had shown atypical responses to EFs (IGF-1, FGF, PACAP). As such, when we first chose to use EFs in human NPCs we wanted to know 1) whether human NPCs even responded to these EFs, 2) whether EFs regulated neurite outgrowth and migration and 3) would there be a differential response in NPCs derived from those with ASD. Our studies were initiated on the I-ASD cohort and given the heterogeneity of ASD we had hypothesized we would get “personalized” neurite and migration phenotypes. Due to this reason, we also wanted to select multiple types of EFs that worked on different signaling pathways. Ultimately, instead of personalized phenotypes we found that all the I-ASD NPCs did not respond to any of the EFs tested whereas the 16p11.2 deletion NPCS did – this was therefore the only difference we found between these two “forms” of ASD. As noted, in I-ASD the lack of response to EFs can be ameliorated by modulating mTOR. However, in the 16p11.2 deletion, despite similar mTOR dysregulation as seen in I-ASD, there is no EF impairment. We do not have a cohesive model to explain why the 16pDel individuals differ from the I-ASD model other than to point to the p- proteomes which do show that the 16pDel NPCs are distinct from the I-ASD NPCs. It seems that mTOR alteration can contribute to impaired EF responsiveness in some NPCs but perhaps there is an additional defect that needs to be present in order for this defect to manifest, or that 16p11.2 deletion NPCs have specific compensatory features. For example, as noted in the thoughtful comment, the p-proteome canonical pathway analysis shows tight junction malfunction in I-ASD which is not present in the 16pDel NPCs and it could be the combination of mTOR dysregulation + dysregulated tight junction signaling that has led to lack of response to EFs in I-ASD. Regardless, we do not think the differences between two genetically distinct ASDs diminish the convergent mTOR results we have uncovered. That is, regardless of whatever defects are present in the ASD NPCs, we are able to rescue it with mTOR modulation which has fascinating implications for treatment and conceptualization for ASD. Lastly, we see our EF studies as an important inclusion as it shows that in some subtypes of ASD, lack of response to appropriate EFs could be contributing to neurodevelopmental abnormalities. Moreover, lack of response to these EFs could have implications for treatment of individuals with ASD (for example, SSRI are commonly used to treat co-morbid conditions in ASD but if an individual is unresponsive to 5- HT, perhaps this treatment is less effective). We have edited the manuscript to include an additional discussion section to address the EFs more thoroughly and have included a few extra sentences in the introduction as well!

      2) A similar bidirectional migration phenotype has been described in hiSPC-derived human cortical interneurons generated from individuals with Timothy Syndrome (Birey et al 2022, Cell Stem Cell). Here, authors show that the intracellular calcium influx that is excessive in Timothy Syndrome or pharmacologically dampened in controls results in similar migration phenotypes. Authors can consider referring to this report in support of the idea that bimodal perturbations of cardinal signaling pathways can converge upon common cellular migration deficits.

      We thank you for pointing out the similar migration phenotype in the Timothy Syndrome paper and have now cited it in our manuscript. We have also expanded on the concept of “too much or too little” of a particular signaling mechanism leading to common outcomes.

      3) Given that authors have access to 8 I-ASD hiPSC lines, it'd very informative to assay the mTOR state (e.g. pS6 westerns) in NPCs derived from all 8 lines instead of the 3 presented, even without assessing any additional cellular phenotypes, which authors have shown to be robust and consistent. This can help the readers better get a sense of the proportion of high mTOR vs low- mTOR classes in a larger cohort.

      We have already addressed this in response to reviewer 1 and the essential revisions section, providing our reasoning for not expanding the study to all 8 I-ASD individuals.

      4) Does the mTOR modulation rescue EF-specific responses to migration as well (Figure 7)

      We did not conduct sufficient replicates of the rescue EF specific responses to migration due to the time consuming and resource intensive nature of the neurosphere experiments. Unlike the neurite experiments, the neurosphere experiments require significantly more cells, more time, selection of neurospheres based on a size criterion, and then manual trace measurements. We did one experiment in Family-1 where we utilized MK-2206 to abolish the response of Sib NPCs to PACAP. Likewise, adding SC-79 to I-ASD-1 neurospheres allowed for response to PACAP.

      Author response image 1.

      Author response image 2.

      Reviewer #3: Public Review

      We appreciate the kind, detailed and very thorough review you provided for us!

      The results on the mTOR signaling pathway as a point of convergence in these particular ASD subtypes is interesting, but the discussion should address that this has been demonstrated for other autism syndromes, and in the present manuscript, there should be some recognition that other signaling pathways are also implicated as common factors between the ASD subtypes.

      With regards to the mTOR pathway, we had included the other ASD syndromes in which mTOR dysregulation has been seen including tuberous sclerosis, Cowden Syndrome, NF-1, as well as Fragile-X, Angelman, Rett and Phelan McDermid in the final paragraph of the discussion section “mTOR Signaling as a Point of Convergence in ASD”. We have now expanded our discussion to include that other signaling pathways such as MAPK, cyclins, WNT, and reelin which have also been implicated as common factors between the ASD subtypes.

      The conclusions of this paper are mostly well supported by data, but for the cell migration assay, it is not clear if the authors control for initial differences in the inner cell mass area of the neurospheres in control vs ASD samples, which would affect the measurement of migration.

      Thank you for this thoughtful comment! When we first started our migration data, inner cell mass size was indeed a major concern for which we controlled in our methods. First, when plating the neurospheres, we would only collect spheres when a majority of spheres were approximately a diameter of 100 um. Very large spheres often could not be imaged due to being out of focus and very small spheres would often disperse when plated. Thus, there were some constraints to the variability of inner cell mass size.

      Furthermore, when we initially collected data, we conducted a proof of principal test to see if initial inner cell mass area (henceforth referred to as initial sphere size or ISS) influenced migration data. To do so, we obtained migration and ISS data from each diagnosis (Sib, NIH, I-ASD, 16pASD). Then we utilized R studio to see if there is a relationship between Migration and ISS in each diagnosis category using the equation (lm(Migration~ISS, data=bydiagnosis). In this equation, lm indicates linear modeling and (~) is a term used to ascertain the relationship between Migration and ISS and the term data=bydiagnosis allows the data to be organized by diagnosis

      The results were expressed as R-squared values indicating the correlation between ISS and Migration for each diagnosis and the p-value showing statistical significance for each comparison. As shown in Author response table 1, for each data set, there is minimal correlation between Migration and ISS in each data set. Moreover, there are no statistically significant relationships between Migration and ISS indicating that initial sphere size DOES NOT influence migration data in any of our data-sets.

      Author response table 1.

      Lastly, utilizing R, we modeled what predicted migration would be like for Sib, NIH, I-ASD, and 16pASD if we accounted for ISS in each group. Raw migration data was then plotted against the predicted data as in Author response image 3.

      Author response image 3.

      As shown in the graph, there are no statistical differences between the raw migration data (the data that we actually measured in the dish) and the modeled data in which ISS is accounted for as a variable. As such, we chose not to normalize to or account for ISS in our other experiments. We have now included the above R studio analyses in our supplemental figures (Figure S1) as well.

      Also, in Fig 5 and 6, panels I and J omit the effects of drug on mTOR phosphorylation as shown for other conditions.

      Both SC-79 and MK2206 were selected in our experiments after thorough analysis of their effects on human epithelial cells and other cultured cells (citations in manuscript). However, initially, we did not know whether either of these drugs would modulate the mTOR pathway in human NPCs, thus, in Figures 5A,5D, 6A and 6D we chose to focus on two of our data-sets to establish the effect of these drugs in human NPCs. Our experiments in Family-1 and Family-2 showed us that SC-79 increases PS6 in human NPCs while MK-2206 downregulates it. Once this was established, we knew the drugs would have similar effects in the NPCs from the other families. Thus, we only conducted a proof of principle test to confirm the drug does indeed have the intended effect in I-ASD-3 and 16pDel. We have included these proof of principle westerns in Figure 5I, 5K, 6I and 6K to show that the effects of these drugs are reproducible across all our NPC lines. We did not include quantification since the data is only from our single proof of principle western.

    1. Author response

      eLife assessment

      Using a genetically controlled experimental setting, the authors find that the lack of Polycomb-dependent epigenetic programming in the oocyte and early embryo influences the developmental trajectory through gestation in the mouse. By showing a two-phase outcome of early growth restriction followed by enhancement, the authors address previous inconsistencies in the field. However, the link with placenta function and gene misregulation is not yet fully supported.

      We thank the Reviewers for their constructive comments. In response we have added significantly more data to the study and substantially rewritten the manuscript. New data include analyses of glucose, amino acid and metabolite levels in fetal and maternal blood samples, more highly resolved fetal growth analyses, a more detailed study of the hyperplastic placenta including IF analyses of labyrinth area, labyrinth to placenta and capillary to labyrinth ratios. We have also added analyses of placental DNA methylation state in offspring from oocytes lacking EED, which reveals a range of DNA methylation changes at imprinted and non-imprinted genes in HET-hom offspring compared to HET-het or WT-wt controls.

      Reviewer #1 (Public Review):

      Oberin, Petautschnig et. al investigated the developmental phenotypes that resulted from oocyte-specific loss of the EED (Embryonic Ectoderm Development) gene - a core component of the Polycomb repressive complex 2 (PRC2), which possess histone methyltransferase activity and catalyses trimethylation of histone H3 at lysine 27 (H3K27). The PRC2 complex plays essential roles in regulating chromatin structure, being an important regulator of cellular differentiation and development during embryogenesis. As novel findings, the authors find that PRC2-dependent programming in the oocyte, via loss of the core component EE2, causes placental hyperplasia and propose that the increase of placental transplacental flux of nutrients leads to fetal and postnatal overgrowth. At the mechanistic level, they show altered expression of genes previously implicated in placental hyperplasia phenotypes. They also establish interesting parallelism with the placental hyperplasia phenotype that is frequently observed in cloned mice.

      Strengths:

      The mouse breeding experiments are very well designed and are powerful to exclude potential confounding genetic effects on the developmental phenotypes that resulted from the loss of EED in oocytes. Another major strength is the developmental profiling across gestation, from pre-implantation to late gestation.

      Weaknesses:

      The evidence for 'oocyte' programming is restricted to phenotypic and gene expression analysis, without measurements of epigenetic dysregulation. It would be an added value if the authors could show evidence for altered H3K27me3 or DNA methylation in the placenta, for example.

      In an earlier previous study we identified a large number of developmentally important genes that accumulated H3K27me3 in primary-secondary stage growing oocytes and were repressed by EED (Jarred et al., 2022 Clinical Epigenetics). However, H3K27me3 was removed from all from these genes during preimplantation development, indicating that maternal inheritance of H3K27me3 at a wide range of genes is unlikely (Jarred et al., 2022 Clinical Epigenetics). Consistent with this only a small number of genes, including Slc38a4 and C2MC, have been shown to be functionally important in H3K27me3-dependent imprinting (Matoba et al., 2022 Genes and Development). Moreover, a related study showed that deletion of Setd2 and consequent loss of H3K36me3 in oocytes led to spreading of H3K27me3 into regions that were otherwise marked by H3K36me3 and DNA methylation (Xu et al. 2019 Nature Genetics 51:844–56). Based on these studies, we proposed that loss of EED and H3K27me3 may result in the ectopic spreading of H3K36me3 and DNA methylation in oocytes and that altered DNA methylation may then be transmitted to offspring and affect developmental outcomes (Jarred et al., 2022 Clinical Epigenetics)

      Given this hypothesis we analysed DNA methylation rather than H3K27me3 in the placenta of WT-wt, HET- het and HET-hom offspring. This revealed differentially methylated regions (DMRs) in HET-hom placentas at two H3K27me3 imprinted genes Sfmbt2 (C2MC) and Mbnl2, five classically imprinted genes and at 74 DMRs not associated with imprinted loci. Together, our data supports the hypothesis from Jarred et al., 2022 Clinical Epigenetics that loss of EED in oocytes results in altered DNA methylation patterning at both imprinted and non-imprinted genes in offspring and that this is likely to affect offspring growth and development. However, whether these changes result from direct alteration of DNA methylation in oocytes remains unclear.

      These new data are now included in results (Lines 387-409), Figure 6I, Supplementary File H-J and Discussion Lines 569-581.

      Reviewer Comment 1. The claim that placental hyperplasia drives offspring catch-up growth is not supported by current experimental data. The authors do not address if transplacental flux is increased in the hyperplastic placentae, measure amino acids and glucose in fetal/maternal plasma, or perform tetraploid rescue experiments to ascertain the contribution of the placenta to growth phenotypes. Furthermore, it is unclear, from the current data, if the surface area for nutrient transport is actually increased in the hyperplastic placenta and the extent to which other cell populations (i.e. spongiotrophoblasts) are affected in addition to glycogen cells. In addition, one of the supporting conclusions that the placenta is a key contributor to fetal overgrowth is based on a very crude measurement - placenta efficiency - which the authors claim is increased in the homozygous mutants compared to controls. After analysing the data carefully, I find evidence for decreased placental efficiency instead. I believe that the authors mistakenly present the data as placenta to fetal weight ratios, which led to the misinterpretation of the 'efficiency' concept.

      We thank the reviewer for pointing out our error in the placental efficiency data and we have now corrected the placental efficiency graphs (fetal/placental weight ratios) and updated the text throughout the manuscript as required (Figure 3I-K). As requested and described below, we have also added significantly more data, which support the conclusion that placental function is not enhanced in HET-hom mice and is unlikely to support fetal growth recovery.

      The new data and analyses we have added include:

      1. Further analyses of glycogen-enriched and non-glycogen-enriched cell counts in the decidua and junctional zones (Figure 4F-J)

      2. Total glycogen cell counts for male and female placentas (Figure 4 – figure supplement 1F)

      3. New analyses of fetal blood glucose levels at E17.5 and E18.5 and matching data from the mothers of each litter (Figure 4M)

      4. New analyses of the circulating amino acid levels and metabolites in fetal blood of E17.5 offspring and matching data from the mothers of each litter (Figure 8)

      5. New IF analyses of CD31 (PECAM-1) and combined this with machine learning assisted quantitative analyses of labyrinth and capillary areas using HALO (Figure 5)

      6. Separated male and female offspring and placental weights at E14.5 and E17.5 and total areas of the placenta, decidua, junctional zone and labyrinth (Figure 3 – figure supplement 1) which provide more insight into potential sex-specific differences in HET-hom offspring and placenta

      We have significantly re-written the results and discussion to reflect our new data and interpretation.

      While we did not assess transplacental flux, our new data revealed: 1. HET-hom fetuses had lower blood glucose levels at E18.5; 2. Circulating levels of amino acids and a wide range of metabolites did not differ between HET-hom and control offspring, or between the mothers of these offspring; 3. HET-hom placentas had lower total labyrinth area, labyrinth/placenta and capillary/labyrinth ratios based on analysis of total capillary and labyrinth areas, indicating that the surface area for nutrient transfer is not increased

      Together these data strongly indicate that hyperplastic HET-hom placentas do not provide greater support to HET-hom fetuses than controls, and that increased placental function in HET-hom offspring is unlikely to explain the late gestation fetal growth recovery we observed in HET-hom offspring or how HET-hom offspring were able to attain normal weights by birth.

      While we have not directly counted the spongiotrophoblast populations, we have now included analyses of both the glycogen-enriched and non-glycogen cell populations in the junctional zone and the decidua (Figure 4H-K). This revealed an increased area of both glycogen-enriched and non-glycogen cells in the junctional zone and in the decidua of HET-hom placentas, consistent with the greater junctional zone/placenta ratio observed in HET-hom placentas (Figure 4D). Together with data in Figure 4C-F and Supp. Fig. 3, our observations demonstrate that the overall decidua and junctional zone areas were increased in HET-hom offspring, but there was a disproportionate expansion of the junctional zone that was caused by increased areas of both glycogen and non-glycogen-enriched cells.

      Tetraploid rescue experiments would require a very significant amount of time and investment and are technically very demanding. While creation of complementary tetraploid offspring would be informative, unfortunately these experiments are beyond the scope of this current study.

      Reviewer Comment 1 cont. The authors do not mention alternative explanations for the observed fetal catch-up and postnatal overgrowth. Why would oocyte epigenetic programming effects be restricted to the placenta, and not include fetal organs?

      Our intention was certainly not to convey a message that effects may be placenta specific. Indeed, our ongoing work beyond the scope of this study provides evidence for effects in other tissues (brain and bones) that will be published elsewhere. Our new data clearly show low placental efficiency, fetal blood glucose, low capillary/labyrinth ratio and no impact on circulating fetal amino acid or metabolite levels in HET-hom offspring. In light of these new data, we have reinterpreted the findings of this study and substantially updated the discussion.

      Given our observations that fetal growth rate markedly increased during late gestation, but placental efficiency was reduced, our data strongly indicate that the effects of altered epigenetic oocyte programming due to loss of Eed affect both the placenta and the fetus. While our findings are significant, the precise mechanism underlying this growth response in HET-hom fetuses remains unknown. Understanding this mechanism will require substantially more work that will be the subject of future studies.

      Reviewer #2 (Public Review):

      Consistent fetal growth trajectories are vital for survival and later life health. The authors utilise an elegant and novel animal model to tease apart the role of Eed protein in the female germline from the role of somatic Eed. The authors were able to experimentally attribute placental overgrowth - particularly of the endocrine region of the placenta - to the function of Eed protein in the oocyte. Loss of Eed protein in the oocyte was also associated with dynamic changes in fetal growth and prolonged gestation. It was not determined whether the reported catch-up growth apparent on the day of birth was due to enhanced fetal growth very late in gestation, a longer gestational time ie the P0 pups are effectively one day "older" compared to the controls, or the pups catching up after birth when consuming maternal milk.

      To understand if increased growth occurred in HET-hom fetuses prior to birth, we have now included analyses of offspring weight at E18.5 (Figure 2F), all pups collected with a verified E19.5 birth date (Figure 2J) and for pups from similar litter sizes (5-7 pups) at E19.5 (Figure 2K). Together with our existing data, these additional analyses provide average weights for fetuses at E14.5, E17.5, E18.5 and pups born on E19.5. This confirmed that HET-hom offspring undergo enhanced growth in the last few days of pregnancy, resulting in the progression of substantially growth and developmentally restricted HET-hom fetuses at E14.5, to pups with normal weight at birth within the 40% of pregnancies that were born on E19.5 in a normal gestational time.

      However, in addition, gestational length was increased by one to two days in 60% of pregnancies from hom oocytes, but not in control pregnancies from het or wt oocytes. As average weights were significantly greater in all surviving HET-hom offspring at P0 (i.e. surviving pups born on E19.5-E21.5; Figure 2G), it appears that this additional gestational time contributed to the offspring overgrowth. This is logical, however it does not explain how growth and developmentally delayed fetuses at E14.5 attained normal weight and developmental stage by E19.5 (Figure 2J-K).

      Together our data clearly show that HET-hom offspring undergo enhanced growth during the late stages of pregnancy, allowing them to resolve the developmental delay and growth insufficiency observed at E14.5 so that they were born at normal weight and stage at E19.5. In addition, increased gestational time contributes to weight of pups delivered on E20.5 or 21.5, partly explaining the overgrowth phenotype observed in this model.

      The idea that increased milk consumption may explain the overgrowth of HET-hom offspring is interesting. It is possible that the increased growth rate of HET-hom offspring continues after birth and contributes to overgrowth. However, examining this outcome in a tightly controlled manner is complicated given that we cannot predict the day of birth of HET-hom litters, and that these litters are generally small and would need to be fostered on the day of birth alongside control litters. Given these challenges and that our primary observation is that HET-hom offspring underwent fetal growth recovery during pregnancies of normal length and via extension of gestational length, we have not examined the possibility of increased milk consumption after birth.

      We have updated the results to reflect the new analyses and have provided relevant discussion to address these data. Our description of these data can be found in Results (lines 165-197) and in Figure 2.

      Reviewer #3 (Public Review):

      My understanding of the main claims of the paper, and how they are justified by the data are discussed below:

      Overall, loss of PRC2 function in the developing oocyte and early embryo causes:

      1) Growth restriction from at least the blastocyst stage with low cell counts and midgestational developmental delay.

      Strengths:

      • Live embryo imaging added an important dimension to this study. The authors were able to confirm an unquantified finding from a previous lab (reduced time to 2-cell stage in oocyte-deletion Eed offspring, Inoue 2018, PMID: 30463900) as well as identify developmental delay and mortality at the blastocyst- hatching transition.

      • For the weight and morphological analysis the authors are careful to provide isogenic controls for most of the experiments presented. This means that any phenotypes can be attributed to the oocyte genotype rather than any confounding effects of maternal or paternal genotype.

      • Overall, there is good evidence that oocyte deletion of Eed results in early embryonic growth restriction, consistent with previous observations (Inoue 2018, PMID: 30463900).

      Reviewer 3, Comment 1: Weaknesses: Gaps in the reporting of specific features of the methodology make it difficult to interpret/understand some of the results.

      While we are unsure exactly which methods Reviewer 3 would like expanded, we have updated parts that we thought required further detail and allow more informed interpretation of the results. These include methods for placental histology (Lines 650-669) and immuno- histochemistry (Lines 671-690), and new methods for CD31 immunofluorescence (Lines 692-714), glucose and metabolomics (Lines 752-769) and DNA methylation (RRBS; Lines 734-750) analyses.

      To clarify the approach taken for histology, immunohistochemical and immunofluorescent staining, sections were cut in compound series from the centre of each placenta, ensuring that we collected representative data for each sample. QuPath was used to quantify the decidual and junctional zone areas in one complete, fully intact midline section for each placenta as close to the midline as possible. This provided data from 10 placentas for each genotype. In addition, glycogen-enriched and non-glycogen-enriched cells were identified and quantified using machine learning assisted QuPath analyses of the whole placenta, decidua and junctional zone regions. We have also added quantitative analyses of the labyrinth and labyrinth capillary network using immunofluorescent CD31 staining and machine learning assisted HALO software. This new analysis of placental morphology is included in the methods section.

      Moreover, as there were no sex-specific differences in placental morphology or weight, we combined the samples from both sexes to provide greater numbers for analysis in each genotype. For example, as described for the analyses of labyrinth and capillaries using CD31 IF, 4 placentas of each sex were used for data collection. This provided data from a total of 8 placentas (4 male and 4 female) for each genotype from a total of 17 WT-wt (9 male and 8 female), 21 HET-het (9 male and 12 female) and 24 HET-hom (16 male and 8 female) sections (2-3 sections/placenta).

      Reviewer 3, Comment 2: Placental hyperplasia with disproportionate overgrowth of the junctional trophoblast especially the glycogen trophoblast (GlyT) cells.

      Strengths: • The authors provide a comprehensive description of how placental and embryo weight is affected by the oocyte-Eed deletion through mid-to-late gestation development. The case for placentomegaly is clear.

      Weaknesses:

      • The placental efficiency data presented in Figure 3G-I is incorrect. Placental efficiency is calculated as embryo mass/placental mass, and it increases over the late gestation period. For e14.5 for example (Fig3G), WT-wt embryo mass = ~0.3g, placenta mass = 0.11g (from Fig 3D) = placental efficiency 2.7; HET-hom = 0.25/0.12 = 2.1. The paper gives values: WT-wt 0.5, HET-hom 0.7. Have the authors perhaps divided placenta weight by embryo mass? This would explain why the E17.5 efficiencies are so low (WT-wt 0.11 rather than a more usual figure of 8.88. If this is the case then the authors' conclusion that placental efficiency is improved by oocyte deletion of Eed is wrong - in fact, placental efficiency is severely compromised.

      The authors have performed cell type counting on histological sections obtained from placentas to discover which cells are contributing to the placentomegaly. This data is presented as %cell type area in the main figure, though the untransformed cross-sectional area for each cell type is shown in the supplementary data. This presentation of the data, as well as the description of it, is misleading because, while it emphasises the proportional increase in the endocrine compartment of the placenta it downplays the fact that the exchange area of the mutant placentas is vastly expanded. This is important for two reasons.

      Firstly, the whole placenta is increased in size suggesting that the mechanism is not placental lineage- specific and instead acting on the whole organ. Secondly in relation to embryonic growth, generally speaking, genetic manipulations that modify labyrinthine volume tend to have a positive correlation with fetal mass whereas the relationship between junctional zone volume and embryonic mass is more complex (discussed in Watson PMID: 15888575, for example). The authors should reconsider how they present this data in light of the previous point.

      We thank the reviewer for pointing out our error in the placental efficiency analysis and apologise for this error. We have corrected the presentation and interpretation of these data and have described this in detail in our response to Reviewer 1, Comment 1.

      As discussed in our response to Reviewer 1, Comment 1, we have added a range of analyses to determine whether placental efficiency was enhanced in HET-hom offspring. These include measuring fetal and maternal circulating glucose levels (Figure 4K), individual amino acids and an extensive range of metabolites (Figure 8) and providing CD31 immunofluorescent analyses of labyrinth area, labyrinth/placental ratio and capillary/labyrinth ratio in HET-hom and control placentas (Figure 5).

      We also added analyses of glycogen enriched and non-glycogen-enriched cell counts in the decidua and junctional zones. As suggested by Reviewer 3, both glycogen-enriched and non-enriched cell populations are significantly increased in HET-hom placentas.

      Combined, these new analyses significantly expand the study and support the conclusion that placental efficiency in HET-hom offspring was either compromised or not different from controls, depending on the analysis. We find no evidence that placental efficiency was increased in HET-hom offspring and have reworked our results and discussion sections to reflect these new data and interpretation.

      Reviewer 3, Comment 2 cont: Again, some of the methods are not clearly reported making interpretation difficult - especially how they have estimated their GlyT number.

      As outlined in our response to Reviewer 3 Comment 1, in the methods section we have added further detail of how we counted glycogen-enriched and non-enriched cells in the decidua and junctional zone regions of sections for the middle of WT-wt, WT-het, HET-het and HET-hom placentas (Lines 650-669).

      Reviewer 3, Comment 3: Perinatal embryonic/pup overgrowth.

      Strengths:

      • The overgrowth exhibited by the oocyte-Eed-deleted pups is striking and confirms the previous work by this group (Prokopuk, 2018). This is an important finding, especially in the context of understanding how PRC2-group gene mutations in humans cause overgrowth syndromes. It is also intriguing because it indicates that genetic/environmental insults in the mother that affect her gamete development can have long-term consequences on offspring physiology.

      Weaknesses:

      • Is the overgrowth intrauterine or is it caused by the increase in gestation length? The way the data is reported makes it impossible to work this out. The authors show that gestation time is consistently lengthened for mothers incubating oocyte-Eed-deleted pups by 1-2 days. In the supplementary material, the mutant embryos are not larger than WT at e19.5, the usual day of birth. Postnatal data is presented as day post-parturition. It would probably be clearer to present the embryonic and postnatal data as days post coitum. In this way, it will be obvious in which period the growth enhancement is taking place. This is information really important to determine whether the increased growth of the mutants is due to a direct effect of the intrauterine environment, or perhaps a more persistent hormonal change in the mother that can continue to promote growth beyond the gestation period.

      We have used embryonic day (E) to denote embryo and fetal age throughout the study – this is the same as using DPC (i.e. E19.5 is equivalent to 19.5 DPC). As described in the Methods “Collection of post-implantation embryos, placenta and postnatal offspring”, mice were time mated for two-four nights, with females plug checked daily. Positive plugs were noted as day E0.5.

      To make the data presentation clearer, we have shown the data for surviving HET-hom pups born on E19.5 (Figure 2J) separately from all HET-hom surviving pups born on E19.5-E21.5. (Figure 2G). As discussed in our response to Reviewer 2, we have also included growth data for pregnancies at E14.5, E17.5, E18.5 (Fig. 2C-F) and E19.5 (Figure 2J,K), as well as P0 (combined data for surviving pups born E19.5-E21.5), and P3 (combined data for surviving pups born E19.5-E21.5, Figure 2G,H).

      These data clearly show that HET-hom fetuses are substantially growth and developmentally delayed at E14.5 (Figure 2D), but HET-hom pups born on E19.5 are the same weight as WT-wt, WT-het and HET-het control pups (Figure 2J). This demonstrates that weight of HET-hom fetuses is normalised in utero between E14.5 and day of birth on E19.5.

      Importantly, as requested by Reviewer 3, we have separated average weight for all surviving pups with a day of birth of E19.5-21.5 (Figure 2G) from average weight of pups born on E19.5 only (Figure 2J). These analyses revealed that the average weight of surviving pups born between E19.5-21.5 was significantly higher than for controls (Figure 2G), but the average weight of pups born on E19.5 only was not. It is therefore clear that extended gestation also contributed to increased HET-hom pup birth weight. We have updated these additional analyses in Results (Lines 165-197) and Figure 2

      As revealed in Figure 2H, it is also possible/likely that growth of HET-hom pups during the three days post- partum may have contributed to the offspring overgrowth we observed in this and our previous study (Prokopuk et al., 2018 Clinical Epigenetics). However, we cannot determine whether there is a contribution from a persistent maternal hormonal change that promotes post-natal offspring growth or whether there is an innate growth benefit in HET-hom pups. As this is very difficult to dissect, separating these possibilities is beyond the scope of our study.

      Reviewer 3, Comment 4: "fetal growth restriction followed by placental hyperplasia, .. drives catch-up growth that ultimately results in perinatal offspring overgrowth".

      Here the authors try to link their observations, suggesting that i) the increased perinatal growth rate is a consequence of placentomegaly, and ii) the placentomegaly/increased fetal growth is an adaptive consequence of the early growth restriction. This is an interesting idea and suggests that there is a degree of developmental plasticity that is operating to repair the early consequences of transient loss of Eed function.

      Strengths:

      • Discrepancies between earlier studies are reconciled. Here the authors show that in oocyte-Eed-deleted embryos growth is initially restricted and then the growth rate increases in late gestation with increased perinatal mass.

      Weaknesses:

      • Regarding the dependence of fetal growth increase on placental size increase, this link is far from clear since placental efficiency is in fact decreased in the mutants (see above).

      • "Catch-up growth" suggests that a higher growth rate is driven by an earlier growth restriction in order to restore homeostasis. There is no direct evidence for such a mechanism here. The loss of Eed expression in the oocyte and early embryo could have an independent impact on more than one phase of development.

      Firstly, there is growth restriction in the early phase of cell divisions. Potentially this could be due to depression of genes that restrain cell division on autosomes, or suppression of X-linked gene expression (as has been previously reported, Inoue, 2018 PMID: 30463900). The placentomegaly is explained by the misregulation of non-canonically imprinted genes, as the authors report (and in agreement with other studies, e.g. Inoue, 2020. PMID: 32358519).

      • Explaining the perinatal phase of growth enhancement is more difficult. I think it is unlikely to be due to placentomegaly. Multiple studies have shown that placentomegaly following somatic cell nuclear transfer (SCNT) is caused by non-canonically imprinted genes, and can be rescued by reducing their expression dosage. However, SCNT causes placentomegaly with normal or reduced embryonic mass (for example -Xie 2022, PMID: 35196486), not growth enhancement. Moreover, since (to my knowledge) single loss of imprinting models of non-canonically imprinted genes do not exist, it is not possible to understand if their increased expression dosage can drive perinatal overgrowth, and if this is preceded by growth restriction and thus constitutes 'catch up growth'.

      Reviewer 3 is correct in their assessment that placental efficiency was decreased in HET- hom offspring and we have corrected the placental efficiency analysis based on fetal/placental weight ratios (discussed in detail in our response to Reviewer 1 Comment 1). We have added substantially more data (glucose, amino acids, metabolites, labyrinth capillary area and density). These data support the conclusion that a placentally driven advantage for HET-hom fetal growth is unlikely, despite our observation that HET- hom fetuses are developmental delayed and underweight at E14.5, but are born at normal weight after a normal gestational length (19.5 days) (discussed in our responses to Reviewer 3, Comment 3 and Reviewer 2).

      This demonstrates that HET-hom fetuses are able to attain normal birth weight despite being initially growth restricted state at E14.5, and that this occurs despite low placental function. Moreover, as we compared isogenic offspring with heterozygous loss of Eed (Het-het compared to HET-hom offspring) the outcomes we observed in HET-hom offspring originate from loss of EED in the growing oocyte or loss of maternal EED in the zygote strongly suggesting that a non-genetic mechanism is involved.

      As pointed out by Reviewer 3, the initial developmental delay in HET-hom offspring may be due to increased expression of genes that regulate cell proliferation – this could clearly explain the lower number of cells we observed in the ICM and the growth delay at later stages of embryonic and fetal development. Another possibility is that maternal PRC2 provided by the oocyte promotes cell divisions in preimplantation embryos We have discussed these possibilities on Lines 467-476.

      In addition, Matoba et al 2022 demonstrated that deletion of maternal Xist together with Eed was able to rescue male-biased lethality in offspring from oocytes lacking Eed, revealing a clear role for X-linked genes in this phenotype (Matoba et al 2022, Genes and Development). However, deletion of maternal Xist did not properly normalise survival offspring from Eed null oocytes (i.e. Eed/Xist double maternal null litters were smaller than litters derived from wild type oocytes) strongly suggesting other mechanisms provide the capacity for HET-hom offspring to attain normal weight at birth. We have added further discussion of the Matoba study in the context of our study on of the Discussion (Lines 544-555)

      Finally, with respect to the outcomes for SCNT derived offspring, we extracted SCNT fetal growth and placental weight data from the supplementary data included in Matoba et al., 2018 Cell Stem Cell. 2018;23(3):343-54.e5 and compared it with data collected in our study (Figure 7). This analysis revealed that the weights of placentas and fetuses of offspring derived via SCNT were very similar to the HET-hom offpsring in our study and we have discussed the similarities and potential differences between HET-hom and SCNT offspring in the Discussion (Lines 478-500).

      As pointed out by Reviewer 3, deletion of maternal non-canonically imprinted genes partially or fully rescued the placental hyperplasia phenotype in both SCNT derived and offspring from oocyte lacking EED. However, as we have discussed, the mechanisms underlying other aspects of the offspring phenotype, such as fetal growth recovery of HET-hom offspring observed in our study, remain unknown. Moreover, the comparison we provide in Figure 7 strongly indicates that HET-hom and SCNT fetuses are similarly delayed at E14.5 and undergo similar fetal growth recovery before birth, but the mechanism also remains unknown. Together, it appears that offspring derived from either Eed-null oocytes or by SCNT have an innate ability to remediate fetal growth restriction during the late stages of pregnancy without a requirement to correct maternally inherited impacts mediated by Xist or H3K27me3-dependent imprinting.

    1. Author response

      Reviewer #1 (Public Review):

      The main contribution appears to be related to functional specialization. I suggest clarifying the major novelty of the present report and to focus the introduction on it.

      We thank this reviewer for this suggestion. We have revised the introduction to emphasize the functional specialization question. The changes are extensive; we have included a tracked-changes version of the manuscript to make these edits easy to see.

      There is a growing literature on fluctuating neural firing patterns that is not considered in this report. The scholarship appears a bit impoverished with only 19 references, many of which point to work from this group of collaborators. I suggest that the authors consider the present work in the context of the wider literature more scholarly, even if not all the relations of these different lines of work can be conclusively connected at this point. For a few examples, there is work by Kienitz and colleagues on fluctuating neural patterns in V4 evoked by competing grating stimuli. Also, the work by Engel, Moore, and colleagues on 'on' and 'off' states in the context of selective attention seems relevant, or the work by Fiebelkorn and Kastner on rhythmic perception and attention.

      We agree completely with this suggestion! We have reworded the introduction to be more inclusive of other research in this area (especially Kienitz and colleagues – exciting work that we are pleased to have had brought to our attention) and we have added about 500 words in the Discussion to cover the work on on/off states (Engel et al.), rhythmic perception (Fiebelkorn & Kastner and others), and attention more generally (e.g., Triesman & Gelade’s work on serial sampling). We are particularly pleased to add these sections because these topics are very much on our minds – we have a commentary piece under review elsewhere in which we evaluate these synergistic lines of approach in a more complete fashion. In total, we’ve added about 15 additional references.

      Reviewer #2 (Public Review):

      The description of the results would benefit from a better explanation of how low spike counts may influence the outcome of the analysis. Due to a smoothing procedure used for visualization, the spike counts for the paired stimuli (AB, black lines) shown in Figure 3a-b and Figure 4a-d go below 0. However, the actual spike count on a trial can not go below 0. The symmetric smoothing procedure may hide an underlying skewed distribution of spike counts that can only be positive. The statistical analysis is not performed on the smoothed distribution but on the actual spike counts, and the validity of the result is therefore not in question. However, the paper would benefit from 1) visualization of the unsmoothed trial counts, and 2) an explanation of how assumptions of symmetric/skewed distributions may affect the outcome.

      We thank the reviewers for noting this and making these suggestions. We now include unsmoothed raw spike counts in all the example figures (Figure 3a-b and Figure 4a-d). With regard to the symmetric/skewed distributions and the analysis methods, a Poisson distribution will be skewed at low rates and become more symmetric at higher rates, so this is already incorporated into the analysis. Indeed, the utility of Poisson distributions for fitting non-negative data is one of the reasons these distributions are so commonly used in neuroscience. We now make this point explicitly at the beginning of Methods/Data analysis: “Our method centers on modeling spike counts based on Poisson distributions, a common technique for handling non-negative count data in neuroscience and other fields.” With this edit as well as the revised example figures now making clear that no spike counts are below zero, we are optimistic that readers will better understand the analysis method and how the shape of response distributions are incorporated into it.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank the editors and reviewers for their helpful comments, which have allowed us to improve the manuscript.

      Response to reviewer 2

      We thank the reviewer for this positive feedback, which requires no further revision.

      Response to reviewer 3

      We thank the reviewer for highlighting these additional points and provide further explanations on these below.

      Firstly, we started the analysis from a baseline of year 2000 because the largest international donor (the Global Fund) uses baseline malaria levels in the period 2000-2004 as the basis of their current allocation calculations (The Global Fund, Description of the 2020-2022 Allocation Methodology, December 2019). In the paper we compare our optimal strategy to a simplified version of this method, represented by our “proportional allocation” strategy.

      Even if our simulations started in the year 2015, a direct comparison with the Global Technical Strategy for Malaria 2016-2030 would not be possible due to the different approaches taken. The GTS was developed to progress towards malaria elimination globally and set ambitious targets of at least 90% reduction in malaria case incidence and mortality rates and malaria elimination in at least 35 countries by 2030 compared to 2015. Mathematical modelling at the time suggested that 90% coverage of WHO-recommended interventions (vector control, treatment and seasonal malaria chemoprevention) would be needed to approach this target (Griffin et al. 2016, Lancet Infectious Diseases). The global annual investment requirements to meet GTS targets were estimated at US$6.4 billion by 2020 and US$8.7 billion by 2030 (Patouillard et al. 2017, BMJ Global Health). This strategy therefore considers what resources would be required to achieve a specific global target, but not the optimized allocation of resources.

      Investments into malaria control have consistently been below the estimated requirements for the GTS milestones (World Health Organization 2022, World Malaria Report 2022). In our study, we therefore take a different perspective on how limited budgets can be optimally allocated to a single intervention (insecticide-treated nets) across countries/settings to achieve the best possible outcome for two objectives that are different to the GTS milestones (either minimizing the global case burden, or minimizing both the global case burden and the number of settings not having yet reached a pre-elimination phase). As stated in the discussion, our estimate of allocating 76% of very low budgets to high-transmission settings was similar to the global investment targets estimated for the GTS, where the 20 countries with the highest burden in 2015 were estimated to require 88% of total investments (Patouillard et al. 2017, BMJ Global Health). Nevertheless, we also show that if higher budgets were available, allocating the majority to low-transmission settings co-endemic for P. falciparum and P. vivax would achieve the largest reduction in global case burden. We acknowledge the modelling of a single intervention as one of the key limitations of this analysis, but this simplification was necessary in order to perform the complex optimisation problem. Computationally it would not have been feasible to optimize across a multitude of intervention and coverage combinations.

      A further limitation raised by the reviewer is the lack of cross-species immunity between P. falciparum and P. vivax in our model. While cross-reactivity between antibodies against these two species has been observed in previous studies and the potential implications of this would be important to explore in future work, we did not include it here as little is known to date about the epidemiological interactions between different malaria parasite species (Muh et al. 2020, PLoS Neglected Tropical Diseases).

      Lastly, we did not assume that transmission was homogenous within the four transmission settings in our study (very low, low, moderate, high); transmission dynamics were simulated separately in each country, accounting for heterogeneous mosquito bite exposure. However, results were summarised for the broader transmission settings since many other country-specific factors were not accounted for (see discussion) and the findings should not be used to inform individual country allocation decisions.


      The following is the authors’ response to the original reviews.

      Author response to peer review

      We thank the reviewers for their insightful comments, which raise several important points regarding our study. As the reviewers have recognised, we introduced a number of simplifications in order to perform this complex optimisation problem, such as by restricting the analysis to a single intervention (insecticide-treated nets) and modelling countries at a national level. Despite their clear relevance to the study, computationally it would not have been feasible to run the multitude of scenarios suggested by reviewer 1, which we recognise as a limitation. As such we agree with the assessment that this study primarily represents a thought experiment, based on substantive modelling and aggregate scenario-based analysis, to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. The findings are relevant primarily to global funders and should not be used to inform individual country allocation decisions, and also point to avenues for further research. This perspective also underlies our decision to start the analysis from a baseline of year 2000 as opposed to modelling the current 2023 malaria situation: the largest international donor (the Global Fund) uses baseline malaria levels in the period 2000-2004 as the basis of their allocation calculations (The Global Fund, Description of the 2020-2022 Allocation Methodology, December 2019) (1). A simplified version of this method is represented by our “proportional allocation” strategy. We have made several revisions to the manuscript to address the points raised by the reviewers, as detailed below.

      Reviewer #1 (Public Review):

      1. The authors present a back-of-the-envelope exploration of various possible resource allocation strategies for ITNs. They identify two optimal strategies based on two slightly different objective functions and compare 3 simple strategies to the outcomes of the optimal strategies and to each other. The authors consider both P falciparum and P vivax and explore this question at the country level, using 2000 prevalence estimates to stratify countries into 4 burden categories. This is a relevant question from a global funder perspective, though somewhat less relevant for individual countries since countries are not making decisions at the global scale.

      Thank you for this summary of the paper. We agree that our analysis is of relevance to global funders, but is not meant to inform individual country allocation decisions. In the discussion, we now state:

      p. 12 L19: “Therefore, policy decisions should additionally be based on analysis of country-specific contexts, and our findings are not informative for individual country allocation decisions.”

      1. The authors have made various simplifications to enable the identification of optimal strategies, so much so that I question what exactly was learned. It is not surprising that strategies that prioritize high-burden settings would avert more cases.

      Thank you for raising this point. Indeed, several simplifying assumptions were necessary to ensure the computational feasibility of this complex optimization problem. As a result, our study primarily represents a thought experiment to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. As now further outlined in the introduction, approaches to this have differed over time and it remains a relevant debate for malaria policy.

      p. 2 L22: “However, there remains a lack of consensus on how best to achieve this longer-term aspiration. Historically, large progress was made in eliminating malaria mainly in lower-transmission countries in temperate regions during the Global Malaria Eradication Program in the 1950s, with the global population at risk of malaria reducing from around 70% of the world population in 1950 to 50% in 2000 (2). Renewed commitment to malaria control in the early 2000s with the Roll Back Malaria initiative subsequently extended the focus to the highly endemic areas in sub-Saharan Africa (3).”

      We believe our findings not only confirm an “expected” outcome – that prioritizing high-burden settings would avert more cases – but also clearly illustrate various consequences of different allocation strategies that are implemented or considered in reality, which may not be so obvious. For example, we found that initially allocating a larger share of the budget to high-transmission countries could be both almost optimal in terms of reducing clinical cases and maximising the number of countries reaching pre-elimination. We also observed a trade-off between reducing burden and reducing the global population at risk (“shrinking the map”) through a focus on near-elimination settings, and estimate the loss in burden reduction when following an elimination target.

      1. Generally, I found much of the text confusing and some concepts were barely explained, such that the logic was difficult to follow.

      Thank you for bringing this to our attention, and we regret to hear the manuscript was confusing to read. We believe that the revisions made as a result of the reviewer comments have now made the manuscript much easier to follow. We additionally passed the manuscript to a colleague to identify confusing passages, and have added a number of sentences to clarify key concepts and improve the structure.

      1. I am not sure why the authors chose to stratify countries by 2000 PfPR estimates and in essence explore a counterfactual set of resource allocation strategies rather than begin with the present and compare strategies moving forward. I would think that beginning in 2020 and modeling forward would be far more relevant, as we can't change the past. Furthermore, there was no comparison with allocations and funding decisions that were actually made between 2000 and 2020ish so the decision to begin at 2000 is rather confusing.

      Thank you for pointing this out. We have now made the rationale for this choice clearer in the manuscript. Our main reason for this was to allow comparison with the Global Fund funding allocation, which is largely based on malaria disease burden in 2000-2004. As stated in the paper, malaria prevalence estimates in the year 2000 are commonly considered to represent a “baseline” endemicity level, before large-scale implementation of interventions in the following decades. In the manuscript, the transmission-related element of the Global Fund allocation algorithm is represented in our “proportional allocation” strategy. Previously this was only mentioned in the methods, but we have now added the following in the results to address this comment of the reviewer:

      p. 6 L12: “Strategies prioritizing high- or low-transmission settings involved sequential allocation of funding to groups of countries based on their transmission intensity (from highest to lowest EIR or vice versa). The proportional allocation strategy mimics the current allocation algorithm employed by the Global Fund: budget shares are mainly distributed according to malaria disease burden in the 2000-2004 period. To allow comparison with this existing funding model, we also started allocation decisions from the year 2000.”

      The Global Fund framework additionally considers economic capacity and other specific factors, and we have now also included a direct comparison with the 2020-2022 Global Fund allocation in Supplementary Figure S12 (see Author response image 1).

      We agree that looking at allocation decisions from 2020 onward would also constitute a very interesting question. However, the high dimensionality in scenarios to consider for this would currently make it computationally infeasible to run on the global level. Not only would it have to include all interventions currently implemented and available for malaria at different levels of coverage, but also the option of scaling down existing interventions. Instead, our priority in this paper was to conduct a thought experiment including both P. falciparum and P. vivax on a large geographical scale.

      Author response image 1.

      Impact of the proportional allocation strategy and the 2020-2022 Global Fund allocation on global malaria cases (panel A) and the total population at risk of malaria (panel B) at varying budgets. Both strategies use the same algorithm for budget share allocation based on malaria disease burden in 2000-2004, but the Global Fund allocation additionally involves an economic capacity component and specific strategic priorities.

      1. I realize this is a back-of-the-envelope assessment (although it is presented to be less approximate than it is, and the title does not reveal that the only intervention strategy considered is ITNs) but the number and scope of modeling assumptions made are simply enormous. First, that modeling is done at the national scale, when transmission within countries is incredibly heterogeneous. The authors note a differential impact of ITNs at various transmission levels and I wonder how the assumption of an intermediate average PfPR vs modeling higher and lower PfPR areas separately might impact the effect of the ITNs.

      Thank you for this comment. We agree the title could be more specific and have changed this to “Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”.

      Regarding the scale of ITN allocation, it is true that allocation at a sub-national scale could affect the results. However, considering this at a national scale is most relevant for our analysis because this is the scale at which global funding allocation decisions are made in practice. A sentence explaining this has been added in the methods.

      p. 15 L8: “The analysis was conducted on the national level, since this scale also applies to funding decisions made by international donors (1).”

      Further considering different geographical scales would also require introducing other assumptions, for example about how different countries would distribute funding sub-nationally, whether specific countries would take cooperative or competitive approaches to tackle malaria within a region or in border areas, and about delays in the allocation of bednets in specific regions. These interesting questions were outside of the scope of this work, but certainly require further investigation.

      1. Second, the effect of ITNs will differ across countries due to variations in vector and human behavior and variation in insecticide resistance and susceptibility to the ITNs. The authors note this as a limitation but it is a little mind-boggling that they chose not to account for either factor since estimates are available for the historical period over which they are modeling.

      Thank you for pointing this out. We did consider this and mentioned it as a limitation. Nevertheless, the complexity of accounting for this should also be recognised; for example, there is substantial uncertainty about the precise relationship between insecticide resistance and the population-level effect of ITNs (Sherrard-Smith et al., 2022, Lancet Planetary Health) (4). Additionally, our simulations extend beyond the 2000-2023 period so further assumptions about future changes to these factors would also be required. Simplifying assumptions are inherent to all mathematical modelling studies and we consider these particular simplifications acceptable given the high-level nature of the analysis.

      1. Third, the assumption that elimination is permanent and nothing is needed to prevent resurgence is, as the authors know, a vast oversimplification. Since resources will be needed to prevent resurgence, it appears this assumption may have a substantial impact on the authors' results.

      Thank you for this comment. In the discussion, we have now expanded on this:

      p. 13 L3: “While our analysis presents allocation strategies to progress towards eradication, the results do not provide insight into allocation of funding to maintain elimination. In practice, the threat of malaria resurgence has important implications for when to scale back interventions.”

      We believe that from a global perspective, the questions of funding allocation to achieve elimination vs to maintain it can currently still be considered separately given the large time-scales involved. The cost of preventing resurgence is not known, and one major problem in accounting for this would also be to identify relevant timescales to quantify this over.

      1. The decision to group all settings with EIR > 7 together as "high transmission" may perhaps be driven by WHO definitions but at a practical level this groups together countries with EIR 10 and EIR 500. Why not further subdivide this group, which makes sense from a technical perspective when thinking about optimal allocation strategies?

      Thank you for pointing this out. The WHO categories used are better interpreted in terms of the corresponding prevalence, which places countries with a prevalence of over 35% in the high transmission categories (WHO Guidelines for malaria, 31 March 2022) (5). We felt this is appropriate given that we are looking at theoretical global allocation patterns and do not aim to make recommendations for specific groups of countries or individual countries within sub-Saharan Africa that would be distinguished through the use of higher cut-offs. In our analysis, all 25 countries in the high transmission category were located in sub-Saharan Africa.

      1. The relevance of this analysis for elimination is a little questionable since no one eliminates with ITNs alone, to the best of my understanding.

      Thank you for this comment. We indeed state in the paper that ITNs alone are not sufficient to eliminate malaria. However, we still think that our analysis is relevant for elimination by taking a more theoretical perspective on reducing transmission using interventions. Starting from the 2000 baseline (or current levels) globally, large-scale transmission reductions such as those achieved by mass ITN distribution still represent the first key step on the path to malaria eradication, as shown in previous modelling work (Griffin et al., 2016, Lancet Infectious Diseases) (6). In the final phase of elimination, the WHO also recommends the addition of more targeted and reactive interventions (WHO Guidelines for malaria, 31 March 2022) (5). Our changes to the title of the article (“Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”) should now better reflect that we consider ITNs as just one necessary component to achieve malaria eradication.

      Reviewer #2 (Public Review):

      1. Schmit et al. analyze and compare different strategies for the allocation of funding for insecticide-treated nets (ITNs) to reduce the global burden of malaria. They use previously published models of Plasmodium falciparum and Plasmodium vivax malaria transmission to quantify the effect of ITN distribution on clinical malaria numbers and the population at risk. The impact of different resource allocation strategies on the reduction of malaria cases or a combination of malaria cases and achieving pre-elimination is considered to determine the optimal strategy to allocate global resources to achieve malaria eradication.

      Strengths:

      Schmit et al. use previously published models and optimization for rigorous analysis and comparison of the global impact of different funding allocation strategies for ITN distribution. This provides evidence of the effect of three different approaches: the prioritization of high-transmission settings to reduce the disease burden, the prioritization of low-transmission settings to "shrink the malaria map", and a resource allocation proportional to the disease burden.

      Thank you for providing this summary and outline of the strengths of the paper.

      1. Weaknesses:

      The analysis and optimization which provide the evidence for the conclusions and are thus the central part of this manuscript necessitate some simplifying assumptions which may have important practical implications for the allocation of resources to reduce the malaria burden. For example, seasonality, mosquito species-specific properties, stochasticity in low transmission settings, and changing population sizes were not included. Other challenges to the reduction or elimination of malaria such as resistance of parasites and mosquitoes or the spread of different mosquito species as well as other beneficial interventions such as indoor residual spraying, seasonal malaria chemoprevention, vaccinations, combinations of different interventions, or setting-specific interventions were also not included. Schmit et al. clearly state these limitations throughout their manuscript.

      The focus of this work is on ITN distribution strategies, other interventions are not considered. It also provides a global perspective and analysis of the specific local setting (as also noted by Schmit et al.) and different interventions as well as combinations of interventions should also be taken into account for any decisions.

      Thank you for raising these points. As outlined at the beginning of our response, for computational reasons we indeed had to introduce several simplifying assumptions to perform this complex optimisation problem. As a result of these factors you highlighted, our study should primarily be interpreted as a thought experiment to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. The findings are relevant primarily to global funders and should not be used to inform individual country allocation decisions, which we have further clarified in the manuscript.

      1. Nonetheless, the rigorous analysis supports the authors' conclusions and provides evidence that supports the prioritization of funding of ITNs for settings with high Plasmodium falciparum transmission. Overall, this work may contribute to making evidence-based decisions regarding the optimal prioritization of funding and resources to achieve a reduction in the malaria burden.

      Thank you for this positive assessment of our work.

      Reviewer #1 (Recommendations For The Authors):

      1. L144: last paragraph, the focus on endemic equilibrium: I did not really understand this, when 39 years is mentioned later is that a different analysis? How are cases averted calculated in a time-agnostic endemic equilibrium analysis? Perhaps a little more detail here would be helpful.

      A further explanation of this has been added in the results and methods.

      p. 8 L 22: “To evaluate the robustness of the results, we conducted a sensitivity analysis on our assumption on ITN distribution efficiency. Results remained similar when assuming a linear relationship between ITN usage and distribution costs (Figure S10). While the main analysis involves a single allocation decision to minimise long-term case burden (leading to a constant ITN usage over time in each setting irrespective of subsequent changes in burden), we additionally explored an optimal strategy with dynamic re-allocation of funding every 3 years to minimise cases in the short term.”

      p. 17 L25: “To ensure computational feasibility, 39 years was used as it was the shortest time frame over which the effect of re-distribution of funding from countries having achieved elimination could be observed.”

      p. 18 L 9: “Global malaria case burden and the population at risk were compared between baseline levels in 2000 and after reaching an endemic equilibrium under each scenario for a given budget.”

      1. L148: what is proportional allocation by disease burden and how is that different from prioritizing high-transmission settings?

      Further details have been added in the text.

      p. 6 L12: “Strategies prioritizing high- or low-transmission settings involved sequential allocation of funding to groups of countries based on their transmission intensity (from highest to lowest EIR or vice versa). The proportional allocation strategy mimics the current allocation algorithm employed by the Global Fund: budget shares are mainly distributed according to malaria disease burden in the 2000-2004 period. To allow comparison with this existing funding model, we also started allocation decisions from the year 2000.”

      1. L198-9: did low transmission settings get the majority of funding at intermediate and maximum budgets because they have the most population (I think so, based on Fig 1)?

      Yes, this is correct. We state in the results: “the optimized distribution of funding to minimize clinical burden depended on the available global budget and was driven by the setting-specific transmission intensity and the population at risk”.

      1. L206: what is ITN distribution efficiency? This is not explained. What is the 39-year period? Why this duration?

      Further explanations have been added in the results section, which were previously only detailed in the methods:

      p. 8 L 22: “To evaluate the robustness of the results, we conducted a sensitivity analysis on our assumption on ITN distribution efficiency. Results remained similar when assuming a linear relationship between ITN usage and distribution costs (Figure S10)."

      p. 17 L25: “To ensure computational feasibility, 39 years was used as it was the shortest time frame over which the effect of re-distribution of funding from countries having achieved elimination could be observed.”

      1. L218: what is "no intervention with a high budget"? is this a phrasing confusion?

      Yes, this has been changed.

      p. 9 L14: “We estimated that optimizing ITN allocation to minimize global clinical incidence could, at a high budget, avert 83% of clinical cases compared to no intervention.”

      1. L235-7: on comparing these results to previous work on the 20 highest-burden countries: is the definition of "high" similar enough across these studies that this is a relevant comparison?

      We believe this is reasonably comparable, as looking at the 20 highest-burden countries encompasses almost the entire high-transmission group in our work (25 countries in total), on which the comparison is made.

      1. L267-70: I didn't understand this sentence at all.

      Thanks for flagging this. The sentence referred to is: “Allocation proportional to disease burden did not achieve as great an impact as other strategies because the funding share assigned to settings was constant irrespective of the invested budget and its impact, and we did not reassign excess funding in high-transmission settings to other malaria interventions.”

      The previously mentioned added details on the proportional allocation strategy in the manuscript should now make this clearer, together with this clarification:

      p. 11 L17: “In modelling this strategy, we did not reassign excess funding in high-transmission settings to other malaria interventions, as would likely occur in practice.”

      For proportional allocation, a fixed proportion of the budget is calculated for each country based on disease burden, as described in the Global Fund allocation documentation (see Methods). However, since ITNs are the only intervention considered, this leads to a higher budget being allocated than is needed in some countries (i.e. where more funding doesn’t translate into further health gains).

      1. L339 EIR range: 80 is high at the country level but areas within countries probably went as high as 500 back in 2000. How does this affect the modeled estimates of ITN impact?

      The question of sub-national differences in transmission has been addressed in the public review comments. Briefly, we consider the national scale to be most relevant for our analysis because this is the scale at which global funding allocation decisions are made in practice. Although, as you correctly point out, the EIR affects ITN impact, it is not possible to conclude what the average effect of this would be on the country level without considering the following factors and introducing further assumptions on these: how would different countries distribute funding sub-nationally? Which countries would take cooperative or competitive approaches to tackle malaria within a region or in border areas? Would there be delays in the allocation of bednets in specific regions? These interesting questions were outside of the scope of this work, but certainly require further investigation.

      1. L347 population size constant: births and deaths are still present, is that right? Unclear from this sentence

      Yes, this is correct. Full details on the model can be found in the Supplementary Materials.

      1. L370 estimating ITN distribution required to achieve simulated population usage: is this a single relationship for all of Africa? Is it based on ITNs distributed 2:1 -> % access -> % usage? So it accounts for allocation inefficiency?

      Yes, this is represented by a single relationship for all of Africa to account for allocation inefficiency and is based on observed patterns across the continent and methodology developed in a previous publication (Bertozzi-Villa et al., 2021, Nature Communications) (7). Full details can be found in the Supplementary Materials (“Relationship between distribution and usage of insecticide-treated nets (ITNs)”, p. 21).

      1. L375: the ITN unit cost is assumed constant across countries and time (I think, it doesn't say explicitly), is this a good assumption?

      Yes, this is correct. We consider this a reasonable assumption within the scope of the paper. While delivery costs likely vary across countries, international funders usually have pooled procurement mechanisms for ITNs (The Global Fund, 2023, Pooled Procurement Mechanism Reference Pricing: Insecticide-Treated Nets).

      1. L399: "single allocation of a constant ITN usage" it is not explained what exactly this means

      Further explanations have been added in the manuscript.

      p. 8 L24: “While the main analysis involves a single allocation decision to minimise long-term case burden (leading to a constant ITN usage over time in each setting irrespective of subsequent changes in burden), we additionally explored an optimal strategy with dynamic re-allocation of funding every 3 years to minimise cases in the short term.”

      Reviewer #2 (Recommendations For The Authors):

      1. Additionally to the public comments, the only major comment is that in this reviewer's opinion, the focus on ITNs as the only intervention should be made clearer at different places in the manuscript (e.g. in the discussion lines 303-304). Otherwise, there are only some minor comments (see below).

      We have now modified the following sentence and also included this suggestion in the title (“Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”).

      p. 13 L8: “Our analysis demonstrates the most impactful allocation of a global funding portfolio for ITNs to reduce global malaria cases.”

      1. Minor comments:
      2. It may be of interest to compare the maximum budget obtained from the optimization with other estimates of required funding and actual available funding.

      Thank you for this interesting suggestion. Our maximum budget estimates are similar to the required investments projected for the WHO Global Technical Strategy: US$3.7 billion for ITNs in our analysis compared to between US$6.8 and US$10.3 billion total annual resources between 2020 and 2030, of which an estimated 55% would be required for (all) vector control (US$3.7 - US$5.7 billion) (Patouillard et al., 2016, BMJ Global Health) (8). However, it is well known that current spending is far below these requirements: total investments in malaria were estimated to be about US$3.1 billion per year in the last 5 years (World Health Organization, 2022, World Malaria Report 2022) (9).

      1. Line 177: should "Figure S7" be bold?

      Yes, this has been corrected.

      1. Line 218: what does "no intervention with high budget" mean? Should this simply be "no intervention"?

      This has been changed.

      p. 9 L14: “We estimated that optimizing ITN allocation to minimize global clinical incidence could, at a high budget, avert 83% of clinical cases compared to no intervention.”

      1. In this reviewer's opinion it would be easier for the reader if the weighting term in the objective function would be added in the Materials and Methods section. The weighting could be added without extending the section substantially and the explanation in lines 390-393 may be easier to understand.

      Thank you for this suggestion. We agree and have added this in the main manuscript.

      References

      1. The Global Fund. Description of the 2020-2022 Allocation Methodology 2019 [Available from: https://www.theglobalfund.org/media/9224/fundingmodel_2020-2022allocations_methodology_en.pdf.

      2. Hay SI, Guerra CA, Tatem AJ, Noor AM, Snow RW. The global distribution and population at risk of malaria: past, present, and future. Lancet Infect Dis. 2004;4(6):327-36.

      3. Feachem RGA, Phillips AA, Hwang J, Cotter C, Wielgosz B, Greenwood BM, et al. Shrinking the malaria map: progress and prospects. The Lancet. 2010;376(9752):1566-78.

      4. Sherrard-Smith E, Winskill P, Hamlet A, Ngufor C, N'Guessan R, Guelbeogo MW, et al. Optimising the deployment of vector control tools against malaria: a data-informed modelling study. The Lancet Planetary Health. 2022;6(2):e100-e9.

      5. World Health Organization. WHO Guidelines for malaria, 31 March 2022. Geneva: World Health Organization; 2022. Contract No.: Geneva WHO/UCN/GMP/ 2022.01 Rev.1.

      6. Griffin JT, Bhatt S, Sinka ME, Gething PW, Lynch M, Patouillard E, et al. Potential for reduction of burden and local elimination of malaria by reducing Plasmodium falciparum malaria transmission: a mathematical modelling study. The Lancet Infectious Diseases. 2016;16(4):465-72.

      7. Bertozzi-Villa A, Bever CA, Koenker H, Weiss DJ, Vargas-Ruiz C, Nandi AK, et al. Maps and metrics of insecticide-treated net access, use, and nets-per-capita in Africa from 2000-2020. Nature Communications. 2021;12(1):3589.

      8. Patouillard E, Griffin J, Bhatt S, Ghani A, Cibulskis R. Global investment targets for malaria control and elimination between 2016 and 2030. BMJ global health. 2017;2(2):e000176.

      9. World Health Organization. World malaria report 2022. Geneva: World Health Organization; 2022. Report No.: 9240064893.

    1. Author Response:

      We take the liberty to thank all of you for your constructive and inspiring comments, which will help us substantially improve the final version of the paper. Before our final revision with details, I am writing this provisional letter to have a quick response to our reviewers’ comments.

      I first give a quick and short summary for your public reviews, then respond point-by-point.

      Editors:

      1. More discussion is needed.

      2. More discussion about eye fixation during adaptation. Discuss why increasing visual uncertainty by blurring the cursor in the present study produces the opposite findings of previous studies (Tsay et al., 2021; Makino et al., 2023).

      3. Discuss the broad impact of the current model.

      4. Share the codes and the metadata (instead of the current data format).

      Response: This is a concise summary of the major concerns listed in the public review. Given these concerns are easy to address, we are giving a quick but point-to-point response for now. The elaborate version will be put into our formal revision.

      **Reviewer 1: **

      1) More credit should be given to the PReMo model: a) The PReMo model also proposes that perceptual error drives implicit adaptation, as in a new publication in Tsay et al., 2023, which was not public at the time of the current writing; and b) The PReMo model can account for some dataset, e.g. Fig 4A.

      Response: We will add this new citation and point out that the new paper also uses the term perceptual error. We will also point out that the PReMo model has the potential to explain Fig 4A, though for now, it assumes an additional visual shift to explain the positive proprioceptive changes relative to the target. We would expand the discussion about the comparison between the two models.

      2) The present study produced an opposite finding of a previous finding, i.e., upregulating visual uncertainty (by cursor blurring here) decreases adaptation for large perturbations but less so for small perturbations, while previous studies have shown the opposite (by using a cursor cloud; Tsay et al., 2021; Makino et al., 2023). This needs explanation.

      Response: Using the cursor cloud (Tsay et al., 2021, Makino et al., 2023) to modulate visual uncertainty has inherent drawbacks that make it unsuitable for testing the sensory uncertainty effect for visuomotor rotation. For the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) AND the sensory estimate of angles (the error itself, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (sigma_v in our model), but it additionally affects the mean of the distribution (mu). This unnecessary confound is avoided by using cursor blurring, which is still a cursor with its center (mu) unchanged from an un-blurred cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2021, the cursor cloud often overlaps with the visual target. This “target hit” would affect adaptation, possibly via a reward learning mechanism (See Kim et al., 2019 eLife). This is a second confound that accompanies the cursor cloud. We will expand our discussion to explain the discrepancy between our findings and previous findings.

      3) The estimation of visual uncertainty (our exp1) required people to fixate on the target, while this might not reflect the actual scenario during adaptation where people are free to look wherever they want.

      Response: Our data shows otherwise: in a typical error-clamp setting, people fixate on the target for the majority of the time. For our Exp1, the fixation on the straight line between the starting position and the target is 86%-95% (as shown in Figure S1). We also collected eye-tracking data in our Exp4, which is a typical error-clamp experiment. More than 95% of gaze falls with +/- 50 pixels around the center of the screen, even slightly higher than Exp1. We will provide this part of the data in the revision. In fact, we designed our Exp1 to mimic the eye-tracking pattern as in typical error-clamp learning with carefully executed pilot experiments.

      This high percentage of fixating on the target is not surprising: the error-clamp task requires participants to use their hands to move towards the target and to ignore the cursor. In fact, we would also like to point out that the high percentage of fixation on the aiming target is also true for conventional visuomotor rotation, which involves strategic re-aiming (shown in de Brouwer et al. 2018; Bromberg et al. 2019; we have an upcoming paper to show this). This is one reason that our new theory would also apply to other types of motor adaptation.

      4) More methodology details are needed. E.g., a figure showing the visual blurring, a figure showing individual data, a table showing data from individual sessions, code sharing, and a possible new correlational analysis.

      Response: All these additional methodological/analysis information will be provided. We were self-limited by writing a short paper, but the revision would be extended for all these details.

      Reviewer 2:

      1) More discussions are needed since the focus of this study is narrowly confined to visuomotor rotation. “A general computational principle, and its contributions to other motor learning paradigms remain to be explored”.

      Response: This is a great suggestion since we also think our original Discussion has not elaborated on the possible broad impact of our theory. Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating multimodal feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least we should consider using the two principles above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for the skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.

      We will add more discussion on the possible broad implications of our model in the revision.

      Reviewer 3:

      1) Similar to Reviewer 1, raised the concern about whether people’s fixation in typical motor adaptation settings is similar to the fixation that we instructed in our Exp1.

      Response: see above.

      2) Similar to Reviewer 2, the concern was raised about whether our new theory is applicable to a broad context. Especially, error clamp appears to be a strange experimental manipulation that has no real-life appeal, “(i)Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world”.

      Response: about the broad impact of our model, please see responses to Reviewer 2 above. We agree that ignoring errors (and thus “trying” to suppress adaptation) should not be a movement strategy for real-world intentional tasks. However, even in real life, we constantly attend to one thing and do the other thing; that’s when implicit motor processes are in charge. Furthermore, it is this exact “ignoring” instruction that elicits the implicit adaptation that we can work on. In this sense, the error-clamp paradigm is a great vehicle to isolate implicit adaptation and allows us to unpack its cognitive mechanism.

      3) In Exp1, the 1s delay between the movement end and the presentation of the reference cursor might inflate the actual visual uncertainty.

      Response: The 1s delay of the reference cursor would not inflate the estimate of visual uncertainty. Our Exp1 used a similar paradigm by visual science (e.g., White, Levi, and Aitsebaomo, Vision Research, 1992), which shows that delay does not lead to an obvious increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6). We will add more methodology justifications in our revision.

      4) Our Fig4A used Tsay et al., 2021 data, which, in the reviewer’s view, is not an appropriate measure of proprioceptive bias. The reason is that in this dataset, “participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to.”

      Response: We agree that Tsay et al., 2021 study used an unconventional way to measure the influence of implicit adaptation on proprioception. And, their observed “proprioceptive changes” should not be called “proprioceptive bias” which is conventionally a reserved term for measuring the difference between the estimated hand location relative to the actual hand location (and better to be a passively moved hand). However, we think their dataset is still subject to the same Bayesian cue combination principle and thus can be modeled. Our modeling of this dataset includes all relevant cues: the implicitly perceived hand position and the proprioceptive cue (given that the hand stays at the movement end). Both cues are in the extrinsic coordinates, which happened to set the target position as zero. But where to set the zero (whether it is the target or the actual hand location) does not matter for the model fitting. Note that our Exp4 is also based on PEA modeling of proprioceptive bias, and this time the data is presented relative to the actual location.

      In the revision, we would keep the current Fig4A and start to call the data as proprioceptive change as opposed to proprioceptive bias to follow the convention.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In no particular order:

      1. In Figs S3 and S4, can they also show gamma fit? (or rather corrected fit accounting for abundance conditioning?) The shapes look different, especially for the microbial mat.

      Author response: We have added gamma distribution fits to the rescaled AFD plots (Figs. S3, S4).

      1. Lines 170-176 seem like they should come before lines 164-166.

      Author response: In lines 166-170 we discuss empirical patterns in the data that motivate the introduction of the SLM as a model in lines 170-175. We have clarified these points in the revision.

      1. The wiggles in the gamma predictions in the occupancy-abundance plots are because occupancy depends not only on abundance but also on the shape parameter, right? Probably good to write a sentence or two explaining what's going on here.

      Author response: We agree with the reviewer that the variation in the prediction could be in-part driven by variation in the shape parameter across community members. We now include this observation in our revision (lines 209-211).

      1. In the predicted vs observed occupancy plots, it would be nice to add curves showing predicted standard deviation or similar to give a sense of how well the model is predicting the variability.

      Author response: In the revised manuscript we now include predictions for the variance of occupancy using the gamma distribution under both taxonomic and phylogenetic coarse-graining (Fig. S9; S10; lines 211-214).

      1. Covariance between sister groups: Figs S9 and S10 look very nice, but it's hard to see much because they're log-log plots over multiple decades, while even a several-fold difference from y = x would indicate a strong effect of correlations. It would be clearer if the y-axis showed the ratio of the coarsegrained variance to the sum of OTU variances and we were looking at how well it fit y = 1.

      Author response: We have included these plots in the revision (Fig. S14, S15).

      1. If the sum of gammas can be well-approximated by a gamma, does that mean that the gamma is just a fairly flexible distribution and we shouldn't take the quality of the gamma fits in general as a very specific indication of what's going on?

      Author response: While the sum of random variables that are drawn from gamma distributions with different parameters is often well-approximated by another gamma, this does not tell us why the gamma distribution holds for microbial communities at the finest-grain level (i.e., OTUs/ASVs). At present, the best explanation is that the gamma is a stationary distribution for certain stochastic differential equations which have ecological interpretations (Grilli, 2020; Shoemaker et al., 2023). Furthermore, alternative two-parameter distributions have been tested alongside the gamma and have done a comparatively poor job capturing observed macroecological patterns (Grilli, 2020). These results suggest that the utility of the gamma distribution is not simply an outcome of its flexible nature, it succeeds because it has captured core ecological properties of microbial communities. In the case of the SLM, gamma-like distributions arise when a community member is subject to self-limiting growth and environmental noise. On the other hand, the stability of the gamma distribution might explain why it can be detected as shape of the AFD, as it does not fade out across coarse-graining level.

      1. What's going on with the variance of diversity in Fig S12? Does this suggest that some of the problem in Figure 4 could be with the analytic approximation rather than the model? I had a hard time understanding the part of the Methods explaining the simulation details (lines 587-597). It would be worth expanding this. Is there some way to explain how the correlations were simulated in terms of the SLM, e.g., correlations in the noise term across OTUs?

      Author response: We believe that deviations in the variance of diversity in Fig. S16g,h are driven by small deviations in our predictions of the second moment $$< (x*ln(x) | N_{m}, \bar{x}{i}, \beta{i}^{2} >$$ (Eq. S16). Alone these predictions are slight, but their effects become noticeable when summed over hundreds or thousands of taxa. We have included this observation in the revised manuscript (lines 268-271). However, this deviation pales in comparison with the magnitude of covariance in the empirical data, suggesting that our inability to predict the variance of richness and diversity is primarily driven by our assumption of statistical independence.

      Regarding the source of the correlations, under the SLM correlations in abundances can be introduced either by adding deterministic interaction terms or through correlated environmental noise. Determining which of these two options drives empirical correlations is an active area of research (e.g., Camacho-Mateu et al., 2023). For the purpose of this study, we remain agnostic on the cause of the correlations, optioning to instead emphasize that that the inclusion of correlations is necessary to reproduce observed slopes of the fine vs. coarse-grained relationship for diversity.

      1. In Figure 5ab, is the idea that the correlation in richness is primarily driven by the number of samples from the environment? Line 390 seems to say so, but it would be good to make this explicit and put it right in that section of the Results.

      Author response: Our results suggest that sampling effort (# reads) plays a larger role in determining the correlations between fine and coarse-grained measures of richness. We now clarify this point in the revised manuscript (lines 429-435).

      1. I don't totally understand the contrast in lines 369-372. If fine-scale diversity within one group begets coarse-grained diversity in another group, couldn't that show up as correlations in the AFDs? Or is the argument that only including within-group correlations in AFDs is enough to reproduce the pattern? I'm not sure I see how that could be.

      Author response: The term “begets” implies both causation and direction. If we see a positive relationship between diversity estimates at two different scales of observation the causal mechanism cannot be determined solely from correlations between samples obtained once from different sites. So, mechanisms consistent with niche construction/"DBD" can produce correlations, though the existence of correlations do not necessarily imply DBD.

      1. The discussion of niche construction on 429-431 doesn't match very well with 440-441. Basically, niche construction is a very broad concept, not a specific one, right?

      Author response: In lines 472-576 (formerly 429-431) we discuss how the existence of correlations between fine and coarse-grained scales does not point to a single ecological mechanism. Alternatively stated, observing a non-zero slope does not mean that niche construction is driving the relationship.

      In lines 476-487 (formerly 440-441) we discuss how the mechanism of cross-feeding has been shown to generate a positive relationship between fine and coarse-grained measures of diversity. This mechanism can be interpreted as a form of “niche construction”, so it is an instance of a tested ecological mechanism that aligns with the interpretation given in Madi et al. (2020).

      1. Isn't (8) just the negative binomial distribution?

      Author response: The convolution of the stationary solution of the SLM (i.e., a gamma distribution) and the Poisson limit of a multinomial sampling distribution returns a negative binomial distribution of read counts across hosts if samples have identical sampling depths. We now include this detail in the revision (line 593-595). Note however that if different samples have different sampling depths, the distribution of reads across samples is not a negative binomial.

      1. Missing 1/M in (9).

      Author response: We have fixed this omission in the revision.

      1. Schematic figures illustrating what the different statistics are intuitively capturing would really help this work be understandable to a broader audience, but they'd also be a ton of work.

      Author response: Richness and diversity are used in ecology to such an extent that we do not see the benefit of a conceptual diagram. Furthermore, we have included a conceptual diagram about our pipeline in our revision at the request of Reviewer 2 (Fig. S20).

      Reviewer #2:

      Major Recommendations

      If I were reviewing this manuscript for a regular journal, I believe the following issues would be important to address prior to publication.

      1. From my reading, the main points of this advance are that

      a. SLM models AFDs well at all levels of coarse-graining.

      b. This makes SLM a better null-model than UNTB for macroecological relationships.

      c. Using SLM on the EMP data, the richness slopes are well explained by SLM but not the diversity slopes. Therefore, any theory that hopes to explain the diversity slopes must include interactions. Argument B appears to be one of the key points yet is missing from the abstract, and should be made clearer. If these aren't the main points the authors intended, then other main points need to be highlighted more.

      Author response: In the revision we now explicitly mention argument b in the Abstract.

      1. The title should be more specific, so as to better reflect the content. (E.g. "UNTB is not a good null model for macroecological patterns" would seem more appropriate.)

      Author response: We would prefer to focus on the success of the SLM rather than the limitations of the UNTB in the title of this work. Therefore, we have modified our title as follows: “Investigating macroecological patterns in coarse-grained microbial communities using the stochastic logistic model of growth”.

      1. The manuscript would benefit from a clearer description of exactly what information the SLM retains about the data (perhaps even a cartoon panel in one of the figures). In particular, it is important to be explicit about the number of model parameters.

      Author response: The number of model parameters for the gamma AFD are now explicitly stated in the revision (Lines 579-580).

      1. The main point of Figures 2-4 seems to be that SLM is good at describing the data (and when it fails it is due to interactions) while UNTB fails to reproduce this behavior, in support of Argument B. This is not clear from the figure descriptions or titles, which focus on SLM's "predictive" power.

      Author response: Fig. 2a demonstrates that the gamma distribution predicted by the SLM explains the empirical distribution of abundances. This result provides motivation to predict the fraction of sites harboring a given community member (i.e., occupancy, Fig. 2c) as well as general measures of community composition including mean richness (Fig. 3a,c) and mean diversity (Fig. 3b,d) using parameters estimated from the data (not free parameters).

      This success led us to consider whether the gamma distribution could predict the variance of richness and diversity, which it could not because it does not capture covariance between community members (Fig. 4).

      In the revision we have identified opportunities to make these points clear throughout the Results. Furthermore, we have added additional detail to the legends of Figs. 2-4.

      1. The manuscript would benefit from clarifying the use of "prediction" related to the SLM. Since the gamma distributions predicted by SLM were fit to empirical data, it seems like the agreement between analytic means and empirical means (Fig. 3) is a statement on gamma distributions being a good fit for the AFD's more than SLM predicting richness and diversity. For example, from my reading, it seems like this analysis could be done numerically by shuffling species abundances across environments and seeing whether this changed the mean richness/diversity. I would not call this shuffling test a prediction, since it is more a statement on the relevance of interactions. SLM predicts gamma-distributed AFD's, but those distributions recovering the data they were trained on doesn't seem like a prediction.

      Author response: In this manuscript we identified the gamma distribution as an appropriate probability distribution to describe the distribution of relative abundances across samples over a range of coarse-grained scales. Motivated by this result, we performed a separate analysis where at each scale we estimated the mean and variance of relative abundance across sites for each community member. We then used these parameters to obtain the expected value of a community-level measure using an equation we derived by assuming that the gamma distribution was appropriate (e.g., richness, Eq. 13). We then compared the expected value of richness to the mean value from empirical data and assessed the similarity between the two values.

      The outcome of this procedure constitutes a prediction. While the mean and variance are parameters, estimating them from the empirical data has no connection with the operation of training a distribution on empirical data. We could have derived predictions such as Eq. 13 using any other probability distribution that can be parameterized using the mean and variance (e.g., Gaussian). Such a prediction would likely do a poor job even though it used the same means and variances used for our gamma predictions. This is because the choice of distribution would not have been a good descriptor of the distribution of abundances across hosts.

      To better explain this last -- perhaps the most significant -- issue, I'd like to ask the authors if the following recasting would be an accurate reflection of their conclusions, or if something is missing.

      1. "Focusing on the empirical relationship observed between diversity slopes by Madi 2020, we ask the question: does explaining these relationships require accounting for species-species correlations? Or could it be reproduced in a noninteracting model?" To address this question, one can perform a randomization test, shuffling abundances to preserve all single-OTU statistics but breaking any correlations. My reading of the authors' results is that (new result 1) the richness relationships would be preserved, while diversity relationships would not be preserved. [Note that this result 1 need not mention either SLM or UNTB.]

      Author response: The question of whether correlations between species are necessary to explain the observed slope of the fine vs. coarse-grained relationship was only one component of our research goals. Our first question was whether the SLM would prove to be a more appropriate null for evaluating the novelty of observed slopes. We believe that our results support the conclusion that the SLM is an appropriate null for this question, as it was able to capture observed slopes of the fine vs. coarse-grained relationship for estimates of richness, determining that correlations and the interactions that are ultimately responsible are not necessary to explain this result.

      We then find that the SLM as a null model fails to capture observed slopes of the fine vs. coarsegrained relationship for estimates of diversity and simulate the SLM with correlations to return reasonable estimates of the slope. However, here the question about correlations is a direct follow-up from our question about a null model that excludes interactions, so it is unclear how a randomization test would relate to this result.

      1. Instead of doing a randomization test (resampling the empirical distribution), one might insist on instead fitting a model to the AFD distributions, and sampling from that distribution rather than the empirical one.

      a. If doing it this way, one should of course ensure that the distribution being fit is a good description of the data.

      b. UNTB is a bad fit. SLM is a better fit, and in fact (new result 2) continues to be a good empirical fit even at coarse-grained levels.

      c. Can make statements on using SLM as a null model for these types of cross-scale relationships. Could try arguing that fitting an SLM model per-OTU (instead of resampling the empirical distribution) could offer some advantage if certain properties could be computed analytically from the fit parameters, instead of averaging over multiple computational rounds of resampling.

      Do these two points accurately summarize the manuscript? If so, this presentation avoids the confusion with "prediction". If my summary is missing some important point, the presentation should be revised to clarify the points I appear to have missed.

      Author response: In our manuscript we derive predictions from the gamma distribution, the stationary distribution of the SLM, that require parameters estimated from the data (i.e., mean and variance of relative abundance). These parameters are estimated from the data using normal procedures and then plugged into our predictions that assume the appropriateness of the gamma, returning values that are then compared to estimates from empirical data. Our estimation of the mean and variance does not assume that the empirical distribution following a gamma distribution, but the value returned by our function derived from the gamma distribution (e.g., Eq. 13) does make that assumption.

      To address the reviewer’s broader comment, we believe that following points summarize our manuscript:

      1. The gamma distribution as a stationary solution of the SLM captures macroecological patterns and predicts typical community-level properties (i.e., mean richness and diversity) across phylogenetic and taxonomic scales.

      2. The gamma distribution fails to predict variation in community-level properties (i.e., variance of richness and diversity) across phylogenetic and taxonomic scales. This occurs because the SLM is a mean-field model that does not explicitly include interactions between community members.

      3. Despite the inability to capture interactions, the gamma distribution succeeds at predicting the fine vs. coarse-grain slope for richness, a pattern that had previously been attributed to community member interactions. This result demonstrates that the novelty of a macroecological pattern hinges on one’s choice of null model.

      4. However, the gamma cannot capture the same relationship for diversity. Simulations of the gamma distribution that incorporate correlations between community members are capable of generating reasonable estimates of the slope.

      To address the reviewer’s comments regarding the appropriateness fitted gamma distributions, in our revision we have added fitted gamma distributions to plots of AFDs so that the reader can visually assess the ability of the gamma to describe empirical patterns (Fig. S3, S4).

      We have also obtained predictions for the slope of the fine vs. coarse-grained relationship for community richness using the same form of UNTB used by Madi et al (2020). In our revised manuscript we establish a procedure to infer the single parameter of this model, generate predictions of richness at fine and coarse-grained scales, and then evaluate whether the UNTB is capable of predicting the slope of the fine vs. coarse-grained relationship for richness (Supplementary Information; Figs. S18, 24-28; lines 277-278; 370-380).

      Other/minor comments

      1. The manuscript would be improved with more consistent terminology ("fine vs. coarse-grained relationship"/"the relationship" vs. "diversity slope"). Also, many readers may be used to OTUs referring to the rather fine level of description, as opposed to any chosen level; and could interpret indexing over groups as being in contrast with indexing over OTU's (coarse vs fine). The authors' use is perfectly correct, but keeping a consistent terminology would help.)

      Author response: We have revised our manuscript to specify the “slope” as the “slope of the fine vs. coarse-grained relationship” (e.g., Line 318). We also specify in the Results and in the Methods that we use “fine” and “coarse” as relative terms, keeping with the sliding-scale approach used in Madi et al (2020).

      1. While I appreciate this "slope" is something borrowed from other work, the clarity of the paper might benefit from a cartoon of how one goes from the raw data to the slopes at a particular coarse-graining level. (Optional).

      Author response: We had added a conceptual diagram to the revision (Fig. S20).

      1. The text often colloquially references "the gamma," "predictions of the gamma," etc. This phrasing comes across as sloppy, and the manuscript would be improved by being more specific.

      Author response: We now specify “gamma” as the “gamma distribution” throughout the manuscript.

      1. Equation 6 appears to be missing some subscripts on the x terms (included on the left of the equation).

      Author response: We thank the reviewer for noticing this error and we have corrected it in the revision.

      1. In "Simulating communities of correlated...AFDs", the acronym SAD is not defined.

      Author response: We thank the reviewer for noticing this error and we have corrected it in the revision.

      1. In Figure 2:

      a. Invariant is probably the wrong word for the title, since all the AFD's were rescaled by mean and variance before being compared. Data does support that the gamma distributions are good at describing the AFD's, but as stated in the description it's the general shape that is preserved, not the distribution itself.

      Author response: When we mention the invariance of the AFD we now specify that we mean that the shape of the distribution remained qualitatively invariant.

      b. I'd recommend changing the color coding to something with more contrast, since currently it's impossible to assess the claim that the shape of the distribution collapses.

      Author response: Our coarse-graining procedure is a sequential operation that has no intuitive point that would suggest the use of a contrasting colormap (e.g., if our scale ranged from -1 to 1 then there would be a natural point of contrast at zero).

      c. The legend is missing relevant technical details: How many OTU's were used to make plot a? How many samples?

      Author response: The number of samples was listed in the Materials and Methods (line 523). In the revision we now include a table with the average and total number of OTUs as well as the average number of reads for each environment (Table S1, S2).

      d. In plot b, is the mean relative abundance referring to "mean abundance when observed" or "mean across all samples"?

      Author response: The mean relative abundance is the mean abundance across all sites (line 204) and in the legend of Fig. 2.

      e. Since one argument here is that SLM fits these distributions better than UNTB, if possible it would be nice to see UNTB's failed fits here.

      Author response: A major feature of the UNTB is that the demographic parameters of community members are indistinguishable. Under the SLM, the variation in the mean relative abundance we observe suggests that the carrying capacities of community members vary over multiple orders of magnitude, a result that is incompatible with most forms of the UNTB (x-axis of Fig. 2b). We now mention this point in the revised manuscript (lines 110; 229; 455-471).

      1. In Figure 3:

      a. It is not clear how coarse-graining is included in model fitting. The "Deriving biodiversity measure predictions" section would benefit from including how coarse-graining is incorporated.

      Author response: We predict measures of biodiversity separately at each coarse-grained scale. We now clarify this detail in the revised manuscript (Lines 624-627).

      b. Reference Shannon Diversity in Methods.

      Author response: We now cite Shannon’s diversity.

      c. What is the blue/white color coding in plots a & c? It doesn't have any color key.

      Author response: Figs. 3-6 use a uniform light-to-dark scale for all environments, with each environment having its own color. For example, Fig. 3a contains data from the human gut microbiome. Human gut data were assigned the color aquamarine, so the shade of aquamarine for a given datapoint in Fig. 3a indicates the phylogenetic scale.

      In the revision we now clarify the colorscale in the legend of Fig. 3 and specify that the same scale is used in all subsequent figure legends.

      d. Re: earlier comments, why is richness considered a prediction? (Am I correct in my interpretation that panel b is almost a tautology - counting the number of zeros in the matrix either by rows or by columns - whereas panel d is nontrivial?)

      Author response: Mean richness as a measure of biodiversity depends on the fraction of sites where a given community member is present (i.e., occupancy). The mean relative abundance of a community member and its variation across sites (beta) is clearly related to occupancy, but those two statistics do not give you a prediction of occupancy. Obtaining a prediction of occupancy and, subsequently, richness, requires 1) a probability distribution of abundances (i.e., the gamma) and 2) a probability distribution of sampling (i.e., the Poisson). Using these two pieces of information, we derived a prediction for mean richness (Eq. 13). We then compare the value of richness obtained by plugging in the mean relative abundances, betas, and known number of reads to the observed mean richness obtained from the data.

      e. The lettering of subplots in Figure 3 is not consistent with Figure 4. Figure 3 subplots are also cited incorrectly in paragraph two on page six (lines 251-254).

      Author response: We thank the reviewer for noticing the error and we have corrected it in the revision.

      f. Again, if possible show UNTB predictions in plots a & c.

      Author response: In our revised manuscript we provide extensive descriptions and predictions of mean richness and the slope of the fine vs. coarse-grained relationship for richness using the form of the UNTB used in Madi et al. (2020; Figs. S18, S24 - S29; lines 277-282; 370-380). We then compare the error of these slope predictions to those obtained from the SLM, finding that the SLM generally outperforms UNTB (Figs. S27-S29).

      1. In Figure 4:

      a. What are the color codings in plots a & b?

      Author response: The color scale used in Fig. 4 is identical to the color scale used in Fig. 3. This detail is now specified in the legend of Fig. 4.

      b. What are the two lines of empirical data in plots a & b, and why is one of them dashed?

      Author response: We now specify what the two lines mean in the key within the figure.

      c. Same comment as earlier on predictions and richness.

      Author response: We now specify what the two lines mean in the key within the figure.

      1. In Figure 5:

      a. It wasn't clear to me in the manuscript how the authors generated these plots from the raw data. The manuscript would benefit from a clear cartoon/description of the data pipeline, from raw data to empirical (and analytic) slopes.

      Author response: We have added a conceptual diagram to the revised manuscript (Fig. S20).

      b. Make the figure title more descriptive to better connect it to the figure's objective (the richness slopes relationship is not novel, but the diversity slopes relationship is).

      Author response: We have revised the figure title.

      References

      Camacho-Mateu, J., Lampo, A., Sireci, M., Muñoz, M. Á., & Cuesta, J. A. (2023). Species interactions reproduce abundance correlations patterns in microbial communities (arXiv:2305.19154). arXiv. https://doi.org/10.48550/arXiv.2305.19154

      Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature Communications, 11(1), 4743. https://doi.org/10.1038/s41467-020- 18529-y

      Madi, N., Vos, M., Murall, C. L., Legendre, P., & Shapiro, B. J. (2020). Does diversity beget diversity in microbiomes? eLife, 9, e58999. https://doi.org/10.7554/eLife.58999

      Shoemaker, W. R., Sánchez, Á., & Grilli, J. (2023). Macroecological laws in experimental microbial systems (p. 2023.07.24.550281). bioRxiv. https://doi.org/10.1101/2023.07.24.550281

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thorough assessment of our study, their overall enthusiasm, and the helpful suggestions for clarifying the methods and results, additional analyses, and discussion points. We have made earnest efforts to address the weaknesses raised in the public review and other recommendations made by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Herein, Blaeser et al. explored the impact of migraine-related cortical spreading depression (CSD) on the calcium dynamics of meningeal afferents that are considered the putative source of migraine-related pain. Critically previous studies have identified widespread activation of these meningeal afferents following CSD; however, most studies of this kind have been performed in anesthetized rodents. By conducting a series of technically challenging calcium imaging experiments in conscious head fixed mice they find in contrast that a much smaller proportion of meningeal afferents are persistently activated following CSD. Instead, they identify that post-CSD responses are differentially altered across a wide array of afferents, including increased and decreased responses to mechanical meningeal deformations and activation of previously non-responsive afferents following CSD. Given that migraine is characterized by worsening head pain in response to movement, the findings offer a potential mechanism that may explain this clinical phenomenon.

      Strengths:

      Using head fixed conscious mice overcomes the limitations of anesthetized preps and the potential impact of anaesthesia on meningeal afferent function which facilitated novel results when compared to previous anesthetized studies. Further, the authors used a closed cranial window preparation to maximize normal physiological states during recording, although the introduction of a needle prick to induce CSD will have generated a small opening in the cranial preparation, rendering it not fully closed as suggested.

      Weaknesses:

      Although this is a well conducted technically challenging study that has added valuable knowledge on the response of meningeal afferents the study would have benefited from the inclusion of more female mice. Migraine is a female dominant condition and an attempt to compare potential sex-differences in afferent responses would undoubtedly have improved the outcome.

      Our study included only two females, largely reflecting the much higher success rate of AAV-mediated meningeal afferent GCaMP expression in males than in females. The reason for the lower yield in female mice is unclear to us at present but may involve, at least partly, sex-specific differences in the mechanisms responsible for efficient transduction with this AAV vector observed in peripheral tissues (Davidoff et al. 2003). While our study did not address sex differences, a recent study (Melo-Carrillo et al. 2017) reported CSD equally activating and sensitizing second-order dorsal horn neurons that receive input from meningeal afferents in male and female rats.

      The authors imply that the current method shows clear differences when compared to older anaesthetized studies; however, many of these were conducted in rats and relied on recording from the trigeminal ganglion. Inclusion of a subgroup of anesthetized mice in the current preparation may have helped to answer these outstanding questions, being is this species dependent or as a result of the different technical approaches.

      We have tried to address the anesthesia issue by conducting imaging sessions in several isoflurane-anesthetized mice. However, during these experiments, we observed a substantial decrease in the GCaMP fluorescence signal with a much lower signal-to-noise ratio that made the analyses of the afferents’ calcium signal unreliable. Reduced GCaMP signal in meningeal axons during anesthesia may be related to the development of respiratory acidosis, since lower pH leads to decreased GCaMP signal, as also mentioned by Reviewer #3. Of note, urethane anesthesia, which was used in all previous rat experiments, also produces respiratory acidosis.

      The authors discuss meningeal deformations as a result of locomotion; however, despite referring to their previous work (Blaeser et al., 2022), the exact method of how these deformations were measured could be clearer. It is challenging to imaging that simple locomotion would induce such deformations and the one reference in the introduction refers to straining, such as cough that may induce intracranial hypertension, which is likely a more powerful stimulus than locomotion.

      As part of the revision, we now provide a better description of the methodology (“Image processing and calcium signal extraction” section) used to determine meningeal deformations, including scaling, shearing, and Z-shift. In our previous paper (Blaeser et al. 2023), we provided an extensive description of the types of meningeal deformations occurring in locomoting mice. It should also be noted that locomotion drives cerebral vasodilation and intracranial pressure increases (Gao and Drew, 2016), which likely mediate, at least in part, the movement of the meninges towards the skull (positive Z-shift) and potentially other meningeal deformation parameters. We also agree with the reviewer that sudden maneuvers such as coughing and sneezing that lead to a larger increase in intracranial pressure are likely to be even more powerful drivers of endogenous intracranial mechanical stimulation than locomotion. Thus, our finding of increased responsiveness to locomotion-related meningeal deformation post-CSD may underestimate the increased afferent responsivity post-CSD during other behaviors such as coughing. We added this point to the discussion.

      More recently, several groups have used optogenetic triggering of CSD to avoid opening of the cranium for needle prick. Given the authors robustly highlight the benefit of the closed cranium approach, would such an approach not have been more appropriate.

      We agree with the reviewer that optogenetic methods used for CSD induction in non-craniotomized animals will further ensure accurate pressurization and, thus, will be an even better approach that avoids the burr hole used for pinprick. It should be noted, however, that the burr hole used for the pinprick likely had a minimal effect on intracranial pressure, as we minimized depressurization by plugging the burr hole throughout the experiments with a silicone elastomer. We have added this information to the revised Methods section.

      It is also worth noting that the optogenetic methodology used by others to provoke CSD was optimized only recently and relies on transgenic mice with a strong expression of YFP (Thy1.ChR2-YFP mice) within the superficial cortex that is not compatible with the afferent GCaMP imaging of meningeal afferents. Modifications using red-shifted opsins may allow the use of this strategy in the future.

      It was not clear how deformations predictors increased independent of locomotion (Figure 4D) as locomotion is essentially causing the deformations as noted in the study. This point was not so clear to this reviewer.

      As noted in our previous paper (Blaeser et al., 2023), deformation variables often exhibit different time courses than locomotion, even when a deformation is initially induced by the onset of locomotion. Most notably, the scaling-related deformation ramps up slowly and often persists for tens of seconds after the onset and termination of locomotion, which may be related to the recovery dynamics of the meningeal vascular response to locomotion. Overall, while locomotion serves as a predictor of meningeal deformation, we observed previously (Blaeser et al. 2023) many afferents whose responses were more closely associated with the moment-to-moment deformations than with the state of locomotion per se, suggesting that a unique set of stimuli is responsible for the activation of this deformation-sensitive afferent population. The increased sensitivity to deformation signals we observed following CSD suggests that the afferent population sensitive to deformation has unique properties that render it most susceptible to becoming sensitized following CSD. We now discuss this possibility.

      Reviewer #2 (Public Review):

      This is an interesting study examining the question of whether CSD sensitizes meningeal afferent sensory neurons leading to spontaneous activity or whether CSD sensitizes these neurons to mechanical stimulation related to locomotion. Using two-photon in vivo calcium imaging based on viral expression of GCaMP6 in the TG, awake mice on a running wheel were imaged following CSD induction by cortical pinprick. The CSD wave evoked a rise in intracellular calcium in many sensory neurons during the propagation of the wave but several patterns of afferent activity developed after the CSD. The minority of recorded neurons (10%) showed spontaneous activity while slightly larger numbers (20%) showed depression of activity, the latter pattern developed earlier than the former. The vast majority of neurons (70%) were unaffected by the CSD. CSD decreased the time spent running and the numbers of bouts per minute but each bout was unaffected by CSD. There also was no influence of CSD on the parameters referred to as meningeal deformation including scale, shear, and Z-shift. Using GLM, the authors then determine that there there is an increase in locomotion/deformation-related afferent activity in 51% of neurons, a decrease in 12% of neurons, and no change in 37%. GLM coefficients were increased for deformation related activity but not locomotion related activity after CSD. There also was an increase in afferents responsive to locomotion/deformation following CSD that were previously silent. This study shows that unlike prior reports, CSD does not lead to spontaneous activity in the majority of sensory neurons but that it increases sensitivity to mechanical deformation of the meninges. This has important implications for headache disorders like migraine where CSD is thought to contribute to the pathology in unclear ways with this new study suggesting that it may lead to increased mechanical sensitivity characteristic of migraine attacks.

      1) It would be helpful to know what is meant by "post-CSD" in many of the figures where a time course is not shown. The methods indicate that 4, 30 min runs were collected after CSD but this would span 2 hours and the data do not indicate whether there are differences across time following CSD nor whether data from all 4 runs are averaged.

      While we monitored time course changes in ongoing activity (see Figure 2), it was challenging to evaluate post-CSD changes in locomotion-related deformation responses at a fine temporal scale, as running bouts resumed at different time points post-CSD and occurred intermittently throughout the post-CSD analysis period. Our experiments were also not sufficiently powered to break out analyses at multiple different epochs post-CSD, partly because there wasn’t much locomotion. To allow comparisons using a sufficient number of bouts, we conducted our GLM analyses using all data collected during running bouts in the 2-hour post-CSD period (termed “post-CSD) versus in the 1-hour pre-CSD period. We have now clarified this further in the main text and figure legends.

      2) Why is only the Z-shift data shown in Figures 4A-C? Each of the deformation values seems to contribute to the activity of neurons after CSD but only the Z-shift values are shown.

      In many afferents, only one deformation variable best predicted the activity at both the pre- and post-CSD epochs. However, at the population level, all deformation variables were equally predictive. In the examples provided, the afferent developed augmented sensitivity that could only be predicted by the Z-shift variable, and the other deformation variables were not included to keep the figure legible. This is now clarified in the figure legend.

      3) How much does the animal moving its skull against the head mount contribute to deformations of the meninges if the skull is potentially flexing during these movements? Even if mice are not locomoting, they can still attempt to move their heads thus creating pressure changes on the skull and underlying meninges. The authors mention in the methods that the strong cement used to bind the skull plates and headpost together minimize this, but how do they know it is minimized?

      We did not measure skull flexing during locomotion and its potential effect on meningeal deformation. However, we would like to point out several considerations. It is evident from numerous imaging studies across various brain regions in freely moving animals, utilizing brain motion registration, that brain motion of the same scale (a few microns), as that observed in our studies, also occurs in the absence of head fixation (e.g., Glas et al, 2019; Zong et al 2021). In our system, the head-fixed mouse is locomoting on a cantilevered (spring-like) running wheel (see also Ramesh et al., 2018), which dissipates most, albeit not all, upward and forward forces applied to the skull during locomotion. Furthermore, the position of the headpost, anterior to where the mouse's paws touch the wheel, makes it hard for the mouse to push straight up and apply forces to the skull. We have updated the text in the methods section (Running wheel habituation) to address this. In our previous work (See Figure 2B in Blaeser et al. 2023), we found a substantial subset of afferents showing an increase in calcium activity that began after each bout of locomotion had terminated, and that lasted for many seconds, suggesting that skull flexing during locomotion may not play a leading role. Finally, we proposed in that study that meningeal deformations play a major role in the afferent response, given our findings of (i) sigmoidal stimulus-response curves between afferent activity and meningeal deformation and (ii) of different afferents that track scaling deformations along different axes. It is unlikely that all of these are related to any residual forces generated from skull deformations.

      4) What is the mechanism by which afferents initiate the calcium wave during the CSD itself? Is this mechanical pressure due to swelling of the cortex during the wave? If so, why does the CSD have no impact on the deformation parameters? It seems that this cortical swelling would have some influence on these values unless the measurements of these values are taken well after cortical swelling subsides. Related to point 1 above, it is not clear when these measurements are taken post-CSD.

      We provide, for the first time, evidence that CSD evokes local calcium elevation in meningeal afferent fibers in a manner that is incongruent with action potential propagation, as the activity gradually advances along individual afferents across many seconds during the wave. As indicated in Figure 1H, we measured these changes during the first 2 minutes post-CSD. Based on the reviewer’s question, we have now addressed whether mechanical changes occurring in the cortex in the wake of CSD might be responsible for the acute afferent activation we observed. We now include new data (Results, “Acute afferent activation is not related to CSD-evoked meningeal deformation” and Figure S2) showing an acute phase of meningeal deformation (as expected given the changes in extracellular fluid volume) lasting 40-80 seconds following the induction of CSD. Our data suggests, however, that these meningeal deformations are unlikely to be the main driver of the acute afferent calcium response. We propose that, based on the speed of the afferent calcium wave propagation and the distinct dynamics of calcium activity as compared to the dynamics of the deformations, the acute afferent response is more likely to be mediated by the spread of algesic mediators (e.g., glutamate, K+ ATP) and their diffusion into the overlying meninges.

      Because the peri-CSD meningeal deformations return to baseline soon after the cessation of the CSD wave, they are unlikely to affect our analyses of post-CSD changes in afferent sensitivity in the following 2 hours. This is also supported by our data (see Figure 3F-H) showing similar locomotion-related deformations pre- and post-CSD, which were measured after the deformations related to the CSD itself had subsided.

      5) How does CSD cause suppression of afferent activity? This is not discussed. It is probably a good idea in this discussion to reinforce that suppression in this case is suppression of the calcium response and not necessarily suppression of all neuronal activity.

      The mechanism underlying the suppression of afferent activity remains unclear. We now discuss the following points:

      First, the pattern of afferent responses resembles the rapid loss of cortical activity in the wake of a CSD, but its faster recovery points to a mechanism distinct from the pre-and post-synaptic changes responsible for the silencing of cortical activity (Sawant-Pokam et al., 2017; Kucharz and Lauritzen, 2018). Whether CSD drives the local release of mediators capable of reducing afferent excitability and spiking dynamics will require further studies.

      Second, the reviewer proposes that the suppressed calcium activity we observed in ~20% of the afferents immediately following CSD may reflect a decreased calcium response independent of afferent spiking activity. Such a process could theoretically involve factors influencing the GCaMP fluorescence (see also our response to Reviewer #3) and/or factors modifying the afferents’ spiking-to-calcium coupling. We note that if a CSD-related factor could modify the calcium response independent of afferent spiking, one would expect a more consistent effect across axons, reflected as a reduced signal in a larger proportion of the afferents, which we did not observe.

      6) How do the authors interpret the influence of CSD on locomotor activity? There was a decrease in bouts but the bouts themselves showed similar patterns after CSD. Is CSD merely inhibiting the initiation of bouts? Is this consistent with what CSD is known to do to motor activity? And again related to point 1, how long after CSD were these measurements taken? Were there changes in locomotor activity during the actual CSD compared to post-CSD?

      To the best of our knowledge, there is very little data on the effect of CSD on motor activity, making it challenging to engage in further speculation regarding the mechanisms underlying the preservation of running bouts patterns post-CSD. Houben et al. (2017) described a similar reduction in locomotion in mice, corresponding to decreased motor cortex (M1) activity, and preservation of intermittent locomotion bouts. In the revised Results section, we now provide information about the cessation of locomotor activity during the CSD wave and have added information regarding the measurement of locomotion following CSD.

      7) The authors mention the caveats of prior work where the skull is open and is thus depressurized. Is this not also the case here given there is a hole in the skull needed to induce CSD?

      Unlike previous electrophysiological studies, which involved several large openings (~2x2 mm), including at the site of the afferents’ receptive field, our study involved only a small burr hole located remotely (1.5 mm) from the frontal edge of our imaging window. As noted in our response to Reviewer #1, this burr hole (~0.5 mm diameter) was unlikely to produce inflammation at the imaging site or cause depressurization as it was sealed with a silicone plug throughout the experiment.

      8) The authors should check the %'s and the numbers in the pie chart for Figure 4. Line 224 says 53 is 22% but it does not look this way from the chart.

      The 22% reported is the percentage of afferents that developed sensitivity post-CSD among all the non-sensitive ones pre-CSD. The pie chart illustrates only afferents that were deemed sensitive before and/or after the CSD. We removed the % to clarify.

      9) Line 319 mentions that CSD causes "powerful calcium transients" in sensory neurons but it is not clear what is meant by powerful if there are no downstream effects of these transients being measured. The speculation is that these calcium transients could cause transmitter release, which would be an important observation in the absence of AP firing, but there are no data evaluating whether this is the case.

      We changed the term to “robust”

      Reviewer #3 (Public Review):

      Summary:

      Blaeser et al. set out to explore the link between CSD and headache pain. How does an electrochemical wave in the brain parenchyma, which lacks nociceptors, result in pain and allodynia in the V1-3 distribution? Prior work had established that CSD increased the firing rate of trigeminal neurons, measured electrophysiologically at the level of the peripheral ganglion. Here, Blaeser et al. focus on the fine afferent processes of the trigeminal neurons, resolving Ca2+ activity of individual fibers within the meninges. To accomplish these experiments, the authors injected AAV encoding the Ca2+ sensitive fluorophore GCamp6s into the trigeminal ganglion, and 8 weeks later imaged fluorescence signals from the afferent terminals within the meninges through a closed cranial window. They captured activity patterns at rest, with locomotion, and in response to CSD. They found that mechanical forces due to meningeal deformations during locomotion (shearing, scaling, and Z-shifts) drove non-spreading Ca2+ signals throughout the imaging field, whereas CSD caused propagating Ca2+ signals in the trigeminal afferent fibers, moving at the expected speed of CSD (3.8 mm/min). Following CSD, there were variable changes in basal GCamp6s signals: these signals decreased in the majority of fibers, signals increased (after a 25 min delay) in other fibers, and signals remained unchanged in the remainder of fibers. Bouts of locomotion were less frequent following CSD, but when they did occur, they elicited more robust GCamp6s signals than pre-CSD. These findings advance the field, suggesting that headache pain following CSD can be explained on the basis of peripheral cranial nerve activity, without invoking central sensitization at the brain stem/thalamic level. This insight could open new pathways for targeting the parenchymal-meningeal interface to develop novel abortive or preventive migraine treatments.

      Strengths:

      The manuscript is well-written. The studies are broadly relevant to neuroscientists and physiologists, as well as neurologists, pain clinicians, and patients with migraine with aura and acephalgic migraine. The studies are well-conceived and appear to be technically well-executed.

      Weaknesses:

      1) Lack of anatomic confirmation that the dura were intact in these studies: it is notoriously challenging to create a cranial window in mouse skull without disrupting or even removing the dura. It was unclear which meningeal layers were captured in the imaging plane. Did the visualized trigeminal afferents terminate in the dura, subarachnoid space, or pia (as suggested by Supplemental Fig 1, capturing a pial artery in the imaging plane)? Were z-stacks obtained, to maintain the imaging plane, or to follow visualized afferents when they migrated out of the imaging plane during meningeal deformations?

      We agree that avoiding disruption of the dura is challenging. Indeed, it took many months of practice before conducting the experiments in this manuscript to master methods for a craniotomy that spared the dura.

      We addressed the issue of meningeal irritation due to cranial window surgery in our previous work (Blaeser et al., 2023). In brief, we conducted vascular imaging using the same cranial window approach and showed no leakage of macromolecules from dural or pial vessels anywhere within the imaging window at 2-6 weeks after the surgery (Figure S1D in Blaeser et al. 2022). This data suggested no ongoing meningeal inflammation below the window. The very low level of ongoing activity we observed at baseline also suggests a lack of an inflammatory response that could lead to afferent sensitization before CSD. This is now mentioned in the Discussion.

      We conducted volumetric imaging for three main reasons: 1) To capture the activity of afferents throughout the meningeal volume. In our volumetric imaging approach, including in this work, we observed afferent calcium signals throughout the meningeal thickness (see Figure 5 in Blaeser et al. 2022). However, the majority of afferents were localized to the most superficial 20 microns (Figure S1E in Blaeser et al. 2022), suggesting that we mostly recorded the activity of dural afferents; 2) to enable simultaneous quantification of three-dimensional deformation and the activity of afferents throughout the thickness of the meninges. This allowed us to determine whether changes in mechanosensitivity could involve augmented activity to intracranial mechanical forces that produced meningeal deformation along the Z-axis of the meninges (e.g., increased intracranial pressure); 3) to provide a direct means to confirm that the afferent GCaMP fluorescent changes we observed were not due to artifacts related to meningeal motion along the Z-axis. We have now added this information to the “Two-photon imaging” section of the Methods.

      2) Findings here, from mice with chronic closed cranial windows, failed to fully replicate prior findings from rats with acute open cranial windows. While the species, differing levels of inflammation and intracranial pressure in these two preparations may contribute, as the authors suggested, the modality of measuring neuronal activity could also contribute to the discrepancy. In the present study, conclusions are based entirely on fluorescence signals from GCamp6s, whereas prior rat studies relied upon multiunit recordings/local field potentials from tungsten electrodes inserted in the trigeminal ganglion.

      As a family, GCamp6 fluorophores are strongly pH dependent, with decreased signal at acidic pH values (at matched Ca2+ concentration). CSD induces an impressive acidosis transient, at least in the brain parenchyma, so one wonders whether the suppression of activity reported in the wake of CSD (Figure 2) in fact reflects decreased sensitivity of the GCamp6 reporter, rather than decreased activity in the fibers. If intracellular pH in trigeminal afferent fibers acidifies in the wake of CSD, GCamp6s fluorescence may underestimate the actual neuronal activity.

      Previous in vivo rodent studies observed a tissue acidosis transient that peaks during the DC shift corresponding to the wavefront of the spreading depolarization, and lasting for ~ 10 min. (Mutch and Hansen, 1984). Since we observed a massive increase in afferent calcium activity with a propagation pattern resembling the cortical wave, it is unlikely that the cortical acidosis during the CSD wave strongly affected the GCaMP signal in the overlying meninges. Furthermore, if cortical acidosis non-discriminately affects the GCaMP signal, one would expect a more consistent effect across axons, reflected as a reduced calcium signal in a larger proportion of the afferents, which we did not observe. Finally, the finding that in affected afferents, decreased calcium activity lasted for > 20 min – a time point when cortical acidosis has fully recovered - points to a distinct underlying mechanism. We also note that any residual acidosis would not confound our main finding of increased calcium responses to meningeal deformation at later periods post-CSD, as acidosis should, if anything, decrease calcium-related fluorescence.

      The authors might consider injecting an AAV encoding a pHi sensor to the trigeminal ganglion, and evaluating pHi during and after CSD, to assess how much this might be an issue for the interpretation of GCamp6s signals. Alternatively, experiments assessing trigeminal fiber (or nerve/ganglion) activity by electrophysiology or some other orthologous method would strengthen the conclusions.

      Please see our comment above regarding the short duration of the pH changes post-CSD.

      N's are generally reported as # of afferents, obscuring the number of technical/biological replicates (# of imaging sessions, # of locomotion bouts, # of CSDs induced, # of animals).

      We now report the number of replicates (# of afferent, # of CSD events, and # of mice).

      Fig 1F trace over the heatmap is not explained in the figure legend. Is this the speed of the running wheel? Is it the apparent propagation rate of the GCamp6s transient through the imaging field?

      We have added to the legend of Figure 1 that the trace in panel F depicts locomotion speed.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable paper examines gene expression differences between male and female individuals over the course of flower development in the dioecious angiosperm Trichosantes pilosa. Male-biased genes evolve faster than female-biased and unbiased genes, which is frequently observed in animals, but this is the first report of such a pattern in plants. In spite of the limited sample size, the evidence is mostly solid and the methods appropriate for a non-model organism. The resources produced will be used by researchers working in the Cucurbitaceae, and the results obtained advance our understanding of the mechanisms of plant sexual reproduction and its evolutionary implications: as such they will broadly appeal to evolutionary biologists and plant biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      The evolution of dioecy in angiosperms has significant implications for plant reproductive efficiency, adaptation, evolutionary potential, and resilience to environmental changes. Dioecy allows for the specialization and division of labor between male and female plants, where each sex can focus on specific aspects of reproduction and allocate resources accordingly. This division of labor creates an opportunity for sexual selection to act and can drive the evolution of sexual dimorphism.

      In the present study, the authors investigate sex-biased gene expression patterns in juvenile and mature dioecious flowers to gain insights into the molecular basis of sexual dimorphism. They find that a large proportion of the plant transcriptome is differentially regulated between males and females with the number of sex-biased genes in floral buds being approximately 15 times higher than in mature flowers. The functional analysis of sex-biased genes reveals that chemical defense pathways against herbivores are up-regulated in the female buds along with genes involved in the acquisition of resources such as carbon for fruit and seed production, whereas male buds are enriched in genes related to signaling, inflorescence development and senescence of male flowers. Furthermore, the authors implement sophisticated maximum likelihood methods to understand the forces driving the evolution of sex-biased genes. They highlight the influence of positive and relaxed purifying selection on the evolution of male-biased genes, which show significantly higher rates of non-synonymous to synonymous substitutions than female or unbiased genes. This is the first report (to my knowledge) highlighting the occurrence of this pattern in plants. Overall, this study provides important insights into the genetic basis of sexual dimorphism and the evolution of reproductive genes in Cucurbitaceae.

      Reviewer #2 (Public Review):

      Summary:

      This study uses transcriptome sequence from a dioecious plant to compare evolutionary rates between genes with male- and female-biased expression and distinguish between relaxed selection and positive selection as causes for more rapid evolution. These questions have been explored in animals and algae, but few studies have investigated this in dioecious angiosperms, and none have so far identified faster rates of evolution in male-biased genes (though see Hough et al. 2014 https://doi.org/10.1073/pnas.1319227111).

      Strengths:

      The methods are appropriate to the questions asked. Both the sample size and the depth of sequencing are sufficient, and the methods used to estimate evolutionary rates and the strength of selection are appropriate. The data presented are consistent with faster evolution of genes with male-biased expression, due to both positive and relaxed selection.

      This is a useful contribution to understanding the effect of sex-biased expression in genetic evolution in plants. It demonstrates the range of variation in evolutionary rates and selective mechanisms, and provides further context to connect these patterns to potential explanatory factors in plant diversity such as the age of sex chromosomes and the developmental trajectories of male and female flowers.

      Weaknesses:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Reviewer #3 (Public Review):

      The potential for sexual selection and the extent of sexual dimorphism in gene expression have been studied in great detail in animals, but hardly examined in plants so far. In this context, the study by Zhao, Zhou et al. al represents a welcome addition to the literature.

      Relative to the previous studies in Angiosperms, the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers).

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      I have reviewed this new version and find that it now addresses some of the shortcomings of the previous manuscript. However, several important limitations still remain:

      1) The conclusion that sex-linked genes contribute relatively little to the patterns described is important and would be worth including in the manuscript briefly (not just the response letter), focusing for instance on the overall comparable proportions of sex-linked genes among male-biased (3/343=0.087%), female-biased (19/1145=1.66%) and unbiased genes (36/2378=1.51%).

      Authors’ response: Thank you for your advice. We have added these sentences in “Discussion” section (Lines 492-499).

      2) The new sentence included in the results "we also found that most of them were members of different gene families generated by gene duplication" is too vague. The motivation of this analysis is not explained, leaving the intended message unclear.

      Authors’ response: In the previous revision, as stressed by reviewer #1 “(2) Paragraph (407-416) describes the analysis of duplicated genes under relaxed selection but there is no mention of this in the results”, we added the sentence “we also found that most of them were members of different gene families generated by gene duplication” in “Relaxed selection” paragraph of the results. Accordingly, in “Discussion” section, we discussed the associations between gene duplication and relaxed selection (Lines 461-473).

      Following your suggestion, we revised the results (Lines 304-307) to “Using the RELAX model, we detected that 18 out of 343 OGs (5.23%) showed significant evidence of relaxed selection (K = 0.0184–0.6497) (Tables S9). Most of the 18 OGs are members of different gene families generated by gene duplication (Table S13)”. This makes it more coherent with the discussion.

      3) The sentences "given that dN/dS values of sex-biased genes were higher due to codon usage bias..." are very confusing. I do not understand the argument being made here. I do not see why "lower dS rates would be expected in sex-biased genes ..."

      Authors’ response: We respectfully argue that codon usage bias was positively related to synonymous substitution rates. That is, stronger codon usage bias may be related to higher synonymous substitution rates (Parvathy et al., 2022). Lower ENC values represent stronger codon usage bias. So, if ω (dN/dS) values of sex-biased genes are higher due to codon usage bias, we expect lower dS rates (That is, higher ENC values). Please refer to the relevant papers (e. g. Darolti et al., 2018; Catalan et al., 2018; Schrader et al., 2021, cited in the references of the paper).

      4) The manuscript now reports the proportion of unitigs annotated by similarity with a number of species. While this is an interesting observation, the reviewer was actually asking for a comparison between the number of unitigs (59,051) and the number of genes annotated in a typical cucurbitaceae genome. This would give an indication of the level of redundancy of the de novo assembled transcriptome.

      Authors’ response: We admit that in the final assembly, transcripts may be overestimated. We respectfully suggest that it may be inappropriate to assess the redundancy of the de novo assembled transcriptome by comparing the transcriptome sequences with the genomic sequences. An appropriate approach is to compare transcriptome sequences and transcriptome sequences among different species. For example, Hu et al., 2020 (reference cited in the paper) obtained 145,975 non-redundant unigenes from flower buds of female and male plants in Trichosanthes kirilowii. Mohanty et al. (2017) obtained 71,823 non-redundant unigenes from flower buds of female and male plants in Coccinia grandis.

      Reference:

      Mohanty JN, Nayak S, Jha S, Joshi RK. 2017. Transcriptome profiling of the floral buds and discovery of genes related to sex-differentiation in the dioecious cucurbit Coccinia grandis (L.) Voigt. Gene. 626: 395-406.

      5) From reading the text I could not understand the extent to which the permutation test actually agreed with the Wilcoxon rank sum test. The text says that the results were "almost consistent", which is too vague. This paragraph should be clarified.

      Authors’ response: We performed permutation test for sex-biased genes in floral buds and flowers at anthesis. However, only in floral buds, the results of both tests (permutation test and Wilcoxon rank sum test) are significant. Taking your suggestions in consideration, we have revised them as “Additionally, we found that only in floral buds, there were significant differences in ω values in the results of ‘free-ratio’ model (female-biased versus male-biased genes, P = 0.04282 and male-biased versus unbiased genes, P = 0.01114) and ‘two-ratio’ model (female-biased versus male-biased genes, P = 0.01992 and male-biased versus unbiased genes, P = 0.02127, respectively) by permutation t test, which is consistent with the results of Wilcoxon rank sum test.(Lines 273-280)”.

      6) The paragraph on the link between codon usage and dN/dS is very unclear and quite unnecessary. I would suggest to simply remove lines 312-323.

      Authors’ response: We respectfully argue that codon usage bias is one of the most important factors for higher rates of sequence evolution. Please refer to Darolti et al. (2018), Catalan et al. (2018) and Schrader et al. (2021) (cited in the references of the paper). We retain these lines here.

      7) The discussion contains many unnecessary repeats from the introduction and results section. I suggest shortening drastically at several places, including:

      • remove lines 367-369

      Authors’ response: Thank you for your suggestion. We revised these lines to “In this study, we compared the expression profiles of sex-biased genes between sexes and two tissue types, investigated whether sex-biased genes exhibited evidence of rapid evolutionary rates of protein sequences and identified the evolutionary forces responsible for the observed patterns in the dioecious Trichosanthes pilosa (Lines 369-373)”.

      We removed the sentence “We compared the expression profiles of sex-biased genes between sexes and two tissue types and examined the signatures of rapid sequence evolution for sex-biased genes, as well as the contributions of potential evolutionary forces. (Lines 374-376)”.

      • remove lines 395-410

      Authors’ response: Here we mainly discussed the possible associations between sex-biased genes, adaptation and sexual dimorphic traits. We retain them here for clarity.

      • remove lines 449-483, as they are almost entirely repetitions of elements already made clear in the results section.

      Authors’ response: In these paragraphs, we discussed reasons that lead to relaxed purifying selection for sex-biased genes. They are coherent with the results section. We retain them to make it clearer.

      Minor comments:

      • line 146: remove "However"

      Authors’ response: We have revised it.

      • line 187: "female flower buds tend to masculinize": the meaning is obscure

      Authors’ response: We revised them as “Using hierarchical clustering analysis, we evaluated different levels of gene expression across sexes and tissues (Fig. 2C). Gene expression for female floral buds clustered most distantly from expression in female flowers at anthesis. However, expression in male floral buds clustered with expression in female flowers at anthesis, suggesting that male floral buds maybe tend to feminization in the early stages of floral development.”.

      • line 226: "we sequenced transcriptomes of T. pilosa": rather say "we used the transcriptomes described above for T. pilosa"

      Authors’ response: We have revised it.

      • line 279: the meaning of "branch-site model A and branch site model null" is still not made clear.

      Authors’ response: We have revised it.

      • line 324: change to: "we also analysed whether female-biased and unbiased genes underwent... "

      Authors’ response: We have revised it.

    2. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable paper examines gene expression differences between male and female individuals over the course of flower development in the dioecious angiosperm Trichosantes pilosa. The authors show that male-biased genes evolve faster than female-biased and unbiased genes. This is frequently observed in animals, but this is the first report of such a pattern in plants. In spite of the limited sample size, the evidence is mostly solid and the methods appropriate for a non-model organism. The resources produced will be used by researchers working in the Cucurbitaceae, and the results obtained advance our understanding of the mechanisms of plant sexual reproduction and its evolutionary implications: as such they will broadly appeal to evolutionary biologists and plant biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      The evolution of dioecy in angiosperms has significant implications for plant reproductive efficiency, adaptation, evolutionary potential, and resilience to environmental changes. Dioecy allows for the specialization and division of labor between male and female plants, where each sex can focus on specific aspects of reproduction and allocate resources accordingly. This division of labor creates an opportunity for sexual selection to act and can drive the evolution of sexual dimorphism.

      In the present study, the authors investigate sex-biased gene expression patterns in juvenile and mature dioecious flowers to gain insights into the molecular basis of sexual dimorphism. They find that a large proportion of the plant transcriptome is differentially regulated between males and females with the number of sex-biased genes in floral buds being approximately 15 times higher than in mature flowers. The functional analysis of sex-biased genes reveals that chemical defense pathways against herbivores are up-regulated in the female buds along with genes involved in the acquisition of resources such as carbon for fruit and seed production, whereas male buds are enriched in genes related to signaling, inflorescence development and senescence of male flowers. Furthermore, the authors implement sophisticated maximum likelihood methods to understand the forces driving the evolution of sex-biased genes. They highlight the influence of positive and relaxed purifying selection on the evolution of male-biased genes, which show significantly higher rates of non-synonymous to synonymous substitutions than female or unbiased genes. This is the first report (to my knowledge) highlighting the occurrence of this pattern in plants. Overall, this study provides important insights into the genetic basis of sexual dimorphism and the evolution of reproductive genes in Cucurbitaceae.

      Reviewer #2 (Public Review):

      Summary:

      This study uses transcriptome sequence from a dioecious plant to compare evolutionary rates between genes with male- and female-biased expression and distinguish between relaxed selection and positive selection as causes for more rapid evolution. These questions have been explored in animals and algae, but few studies have investigated this in dioecious angiosperms, and none have so far identified faster rates of evolution in male-biased genes (though see Hough et al. 2014 https://doi.org/10.1073/pnas.1319227111).

      Strengths:

      The methods are appropriate to the questions asked. Both the sample size and the depth of sequencing are sufficient, and the methods used to estimate evolutionary rates and the strength of selection are appropriate. The data presented are consistent with faster evolution of genes with male-biased expression, due to both positive and relaxed selection.

      This is a useful contribution to understanding the effect of sex-biased expression in genetic evolution in plants. It demonstrates the range of variation in evolutionary rates and selective mechanisms, and provides further context to connect these patterns to potential explanatory factors in plant diversity such as the age of sex chromosomes and the developmental trajectories of male and female flowers.

      Weaknesses:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Reviewer #3 (Public Review):

      The potential for sexual selection and the extent of sexual dimorphism in gene expression have been studied in great detail in animals, but hardly examined in plants so far. In this context, the study by Zhao, Zhou et al. al represents a welcome addition to the literature.

      Relative to the previous studies in Angiosperms, the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers).<br /> Some aspects of the presentation have been improved in this new version of the manuscript.

      Specifically:

      • the link between sex-biased and tissue-biased genes is now slightly clearer,

      • the limitation related to the de novo assembled transcriptome is now formally acknowledged,

      • the interpretation of functional categories of the genes identified is more precise,

      • the legends of supplementary figures have been improved - a large number of typos have been fixed.

      in response to this first round of reviews. As I detail below, many of the relevant and constructive suggestions by the previous reviewers were not taken into account in this revision.

      For instance:

      • Reviewer 2 made precise suggestions for trying to take into account the potential confounding factor of sex-chromosomes. This suggestion was not followed.

      For the question of reviewer 2:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Empirically, the analyses could be expanded by an attempt to distinguish between genes on the autosomes and the sex chromosomes. Genotypic patterns can be used to provisionally assign transcripts to XY or XX-like behavior when all males are heterozygous and all females are homozygous (fixed X-Y SNPs) and when all females are heterozygous and males are homozygous (lost or silenced Y genes). Comparing such genes to autosomal genes with sex-biased expression would sharpen the results because there are different expectations for the efficacy of selection on sex chromosomes. See this paper (Hough et al. 2014; https://www.pnas.org/doi/abs/10.1073/pnas.1319227111), which should be cited and does in fact identify faster substitution rates in Y-linked genes.

      Authors’ response: We have cited Hough et al. (2014) and Sandler et al. (2018) in the revised manuscript. We agree that the presence of sex chromosomes is potentially a confounding factor. By adopting methods in Hough et al. (2014) and Sandler et al. (2018), we tried to distinguish transcripts on sex chromosomes from autosomal chromosomes. For a total of 2,378 unbiased genes, we found that 36 genes were putatively sex chromosomal genes, 20 of which were exclusively heterozygous and homozygous for males and females, respectively; while the other 16 genes showing an opposite genotyping patterns between males and females. For 343 male-biased genes, only three ones exhibit a pattern of potentially sex-linked. For the 1,145 female-biased genes, we identified 19 genes which might located on the sex chromosomes. Among the 19 genes, five genes were exclusively heterozygous for males and exclusively homozygous for females, while reversed genotyping patterns presented in the other 14 genes. So, sex-linked genes may contribute relatively little to rapid evolution of male-biased genes. An alternative explanation is that the results could be unreliable due to small sample sizes. Thus, we did not describe them in the Results section. We will investigate the issue when whole genome sequences and population datasets become available in the near future.

      • Reviewer 1 & 3 indicated that results were mentioned in the discussion section without having been described before. This was not fixed in this new version.

      For the question of reviewer 1:

      2) Paragraph (407-416) describes the analysis of duplicated genes under relaxed selection but there is no mention of this in the results.

      Authors’ response: Following this suggestion, in the Results section, we have added a sentence, “We also found that most of them were members of different gene families generated by gene duplication (Table S13)” on line 310-311 in the revised manuscript (Rapid_evolution_of_malebiased_genes_Trichosanthes_pilosa_Tracked_change_2023_11_06.docx).

      For the question of reviewer 1:

      38- line 417-424. The discussion should not contain new results.

      Authors’ response: Thank you for pointing out this. In the Results section, we have added a few sentences as following: “Similarly, given that dN/dS values of sex-biased genes were higher due to codon usage bias, lower dS rates would be expected in sex-biased genes relative to unbiased genes (Ellegren & Parsch, 2007; Parvathy et al., 2022). However, in our results, the median of dS values in male-biased genes were much higher than those in female-biased and unbiased genes in the results of ‘free-ratio’ (Fig. S4A, female-biased versus male-biased genes, P = 6.444e-12 and malebiased versus unbiased genes, P = 4.564e-13) and ‘two-ratio’ branch model (Fig. S4B, femalebiased versus male-biased genes, P = 2.2e-16 and male-biased versus unbiased genes, P = 9.421e08, respectively). ” on line 323-331, and consequently, removed the following sentence, “femalebiased vs male-biased genes, P = 6.444e-12 and male-biased vs unbiased genes, P = 4.564e-13” and “female-biased versus male-biased genes, P = 2.2e-16 and male-biased versus unbiased genes, P = 9.421e-08, respectively” in the Discussion section.

      • Reviewer 1 asked for a comparison between the number of de novo assembled unigenes in this transcriptome and the number of genes in other Cucurbitaceae species. I could not see this comparison reported.

      Authors’ response: In the first revision, we described only percentages. We have now added the number of genes. We modify this part as follows: “The majority of unigenes were annotated by homologs in species of Cucurbitaceae (61.6%, 36,375), including Momordica charantia (16.3%, 9,625), Cucumis melo (11.9%, 7,027), Cucurbita pepo (11.9%, 7,027), Cucurbita moschata (11.5%, 6,791), Cucurbita maxima (10.1%, 5,964) and other species (38.4%, 22,676) (Fig. S1C).”.

      • Reviewer 1 pointed out that permutation tests were more appropriate, but no change was made to the manuscript.

      Authors’ response: Thank you for your suggestion. In the first revision, we have indirectly responded to the issues. Wilcoxon rank sum test is more commonly used for all comparisons between sex-biased and unbiased genes in many papers. Additionally, we tested datasets using permutation t-tests, which is consistent with the results of Wilcoxon rank sum test. For example, we found that only in floral buds, there are significant differences in ω values in the results of ‘free-ratio’ (female-biased versus male-biased genes, P = 0.04282 and male-biased versus unbiased genes, P = 0.01114) and ‘two-ratio’ model (female-biased versus male-biased genes, P = 0.01992 and male-biased versus unbiased genes, P = 0.02127, respectively). We also described these results in the Results section accordingly (line 278-284).

      • Reviewer 3 pointed out the small sample size (both for the RNA-seq and the phylogenetic analysis), but again this limitation is not acknowledged very clearly.

      Authors’ response: Sorry, we acknowledged that our sample size was relatively small. In the revised version, we have added a sentence as follows, “Additionally, our sample size is relatively small, and may provide low power to detect differential expression.” in the Discussion section.

      • Reviewer 1 & 3 pointed out that Fig 3 was hard to understand and asked for clarifications that I did not see in the text and the figure in unchanged.

      Authors’ response: Thank you for your suggestions. We have revised the manuscript to clarify the meaning of the acronym (F1TGs, F2TGs, M1TGs, M2TGs, F1BGs, F2BGs, M1BGs and M2BGs) and presented the number of genes. We have added two labels, indicating that panels A and B correspond to males and C and D to females in Fig. 3.

      • Reviewer 3 suggested to combine all genes with sex-bias expression when evaluating the evolutionary rate, in addition to the analyses already done. This suggestion was not followed.

      For the question of reviewer 3:line 196 and following: In these analyses, I could not understand the rationale for keeping buds vs mature flowers as separate analyses throughout. Why not combine both and use the full set of genes showing sex-bias in any tissue? This would increase the power and make the presentation of the results a lot more straightforward.

      Authors’ response: Thank you for your suggestions. In the first revision, we tried to respond to the issues. First, we observed strong sexual dimorphism in floral buds, such as racemose versus solitary, early-flowering versus late-flowering. Second, as you pointed out earlier, “the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers)”, we totally agree with you on this point. Third, according to your suggestions, we combined all genes with sex-bias expression to evaluate the evolutionary rates. We found significant differences (please see a Figure below) in ω values in the results of ‘free-ratio’ (female-biased versus male-biased genes, P =0.005622 and male-biased versus unbiased genes, P = 0.001961) and ‘two-ratio’ model (female-biased versus male-biased genes, P = 0.008546 and male-biased versus unbiased genes, P = 0.009831, respectively) using Wilcoxon rank sum test. However, the significance is lower than previous results in floral buds due to sex-biased genes of mature flower joined, especially compared to the results of “free-ratio model”. Additionally, we also test all combined genes with sex-bias expression using permutation t-test. Unfortunately, there are no significant differences in ω values expect for male-biased versus unbiased genes in the results of ‘free-ratio’ model (P = 0.03034) and ‘two-ratio’ model (P = 0.0376), respectively. To a certain extent, the combination of all genes with sex-bias expression may cover the signals of rapid evolution of sex-biased genes in floral buds. Therefore, these results are not described in our manuscript. In the near future, we would like to make further investigations through more development stages of flowers and new technologies (e.g. Single-Cell method, See Murat et al., 2023) in each sex to consolidate the conclusion, and it is hoped that we could find more meaningful results.

      Author response image 1.

      • Reviewer 3 pointed out that hand-picking specific categories of genes was not statistically valid, and in fact not necessary in the present context. This was not changed.

      For the question of reviewer3: removing genes on a post-hoc basis seems statistically suspicious to me. I don't think your analysis has enough power to hand-pick specific categories of genes, and it is not clear what this brings here. I suggest simply removing these analyses and paragraphs.

      Authors’ response: Thank you for your suggestions. We have changed them accordingly. We removed a part of the following paragraph, “To confirm the contributions of positive selection and relaxed selection to rapid rates of male-biased genes in floral buds, we generated three datasets of OGs by excluding different sets of genes. Specifically, we excluded 18 relaxed selective male-biased genes (5.23%), 98 positively selected male-biased genes (28.57%), and 112 male-biased genes (32.65%) under positive and relaxed selection from 343 OGs (Fig. S4). We observed that after excluding male-biased genes under relaxed purifying selection, the median (0.264) decreased by 0.34% compared to the median (0.265) of all OGs (Fig. S4A-B). However, after excluding positively selected male-biased genes, the median (0.236) was reduced by 11% (Fig. S4A, C) in the results of ‘free-ratio’ branch model. This pattern was consistent with the results of ‘two-ratio’ branch model as well (Fig. S4E-G).” on line 290 to 300.

      However, we kept the following paragraph, “We also analyzed female-biased and unbiased genes that underwent positive and relaxed selection in floral buds (Tables S6-S10). We identified 216 (18.86%) positively selected, and 69 (6.03%) relaxed selective female-biased genes from 1,145 OGs, respectively. Similarly, we found 436 (18.33%) positively selected, and 43 (1.81%) unbiased genes under relaxed selection from 2,378 OGs, respectively. Notably, male-biased genes have a higher proportion (10%) of positively selected genes compared to female-biased and unbiased genes. However, relaxed selective male-biased genes have a higher proportion (3.24%) than unbiased genes, but about 0.8% lower than that of female-biased genes.”. In this way, we can compare the proportion of sex-biased genes that have undergone positive selection and release selection among female-biased genes, unbiased genes and male-biased genes in floral buds in the Discussion section.

      • Reviewer 1 asked for all data to be public, but I could not find in the manuscript where the link to the data on ResearchGate was provided.

      Authors’ response: We have added a link in the Data Availability section.

      • Reviewers 1 & 3 pointed out that since only two tissues were compared, the claims on pleiotropy should have been toned down, but no change was made to the text.

      Authors’ response: Thank you for your suggestions. We revised “due to low pleiotropic constraints” to “due to low evolutionary constraints” and revised “low pleiotropy” to “low constraints”.

      • Reviewer 1 asked for a clarification on which genes are plotted on the heatmap of Fig3C and an explanation of the color scale. No change was made.

      Authors’ response: Sorry for the confusion. Actually, Reviewer 1 asked that “Fig. 2C, which genes are plotted on the heatmap and what is the color scale corresponding to?” In the previous revision, we have revised them (See Fig. 2 Sex-biased gene expression for floral buds and flowers at anthesis in males and females of Trichosanthes pilosa). Sex-biased genes (the union of sex-biased genes in F1, M1, F2 and M2) are plotted on the heatmap. The color gradient represents from high to low (from red to green) gene expression.

      • Reviewer 1 asked for panel B in Fig S5 and S6 to be removed. They are still there. They asked for abbreviations to be explained in the legend of Fig S8. This was not done. They asked for details about columns headers. Such detailed were not added. They asked for more recent references on line 53-56: this was not done.

      Authors’ response: We have removed panel B in Fig. S5 and S6. We explained abbreviations in text and Fig. S8. We added more details about the column headers in Supplementary Table S4, S5, S6, S7, S8, S9 and S10. We also added more recent references on line 53-56.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Authors’ response: Thank you for your suggestions. We have revised/fixed these issues following your concerns and suggestions.

      Line 46-48 would be clearer as « Sexual dimorphism is the condition where sexes of the same species exhibit different morphological, ecological and physiological traits in gonochoristic animals and dioecious plants, despite male and female individuals sharing the same genome except for sex chromosomes or sex-determining loci »

      Authors’ response: Thanks. We have revised it accordingly.

      Line 50: replace «in both » by «between the two »

      Authors’ response: We have revised it.

      Line 51: « genes exclusively » -> « genes expressed exclusively »

      Authors’ response: We have revised it.

      Line 58: « in many animals » -> « in several animal species »

      Authors’ response: We have revised it to “in some animal species”.

      Line 58: « to which » -> « of this bias »

      Authors’ response: We have revised it.

      Line 64: « Most dioecious plants possess homomorphic sex-chromosomes that are roughly similar in size when viewed by light microscopy. » : a reference is missing

      Authors’ response: We have added the reference.

      Line 67: remove « that »

      Authors’ response: We have revised it.

      line 96: change to: « only the five above-mentioned studies »

      Authors’ response: We have revised it.

      Line 97: remove « the »

      Authors’ response: We have revised it.

      Line 111: « Drosophia » -> Drosophila

      Authors’ response: We have revised it.

      Line 114: exhibiting -> « exhibited »

      Authors’ response: We have revised it.

      Line 115: suggest -> « suggesting »

      Authors’ response: We have revised it.

      Line 117: « studies in plants have rarely reported elevated rates of sex-biased genes » : is it « rarely » or « never » ?

      Authors’ response: We have revised to “never”.

      Line 143: « It’s » -> « Its »

      Authors’ response: We have revised it.

      Line 143-146: say whether the male parts (e.g. anthers) are still present in females flowers, and the female parts (pistil+ ovaries) in the male flowers, or whether these respective organs are fully aborted.

      Authors’ response: We have added the following sentence, “The male parts (e. g., anthers) of female flowers, and the female parts (e. g., pistil and ovaries) of male flowers are fully aborted” in line 148150 of the Introduction section.

      Line 158: this is now clearer, but please specify whether you are talking about 12 floral buds in total, or 12 per individual (i.e. 72 buds in total).

      Authors’ response: We have revised it to “Using whole transcriptome shotgun sequencing, we sequenced floral buds and flowers at anthesis from female and male of dioecious T. pilosa. We set up three biological replicates from three female and three male plants, including 12 samples in total (six floral buds and six flowers at anthesis)”.

      Line 194-198: These sentences are unclear and hard to link to the figure. Consider changing for « In male plants, the number of tissue-biased genes in flowers at anthesis (M2TGs: n = 2795) was higher than that in floral buds (M1TGs: n = 1755, Fig. 3A and 3B). Figure 3 is also very hard to read. Adding a label on the side to indicate that panels A and B correspond to male-biased genes and C and D to female-biased genes could be useful.

      Authors’ response: Thank you for your suggestions. We have revised the text to clarify the meaning of the acronym (F1TGs, F2TGs, M1TGs, M2TGs, F1BGs, F2BGs, M1BGs and M2BGs) and presented the number of genes. We have added two labels, indicating that panels A and B correspond to males and C and D to females in Figure 3.

      Line 208: explain the approach: e.g. « We then compared rates of protein evolution among malebiased, female-biased and unbiased genes. To do this, we sequenced floral bud transcriptomes from the closely related T. anguina, as well as two more distant outgroups, T. kirilowii and Luffa cylindrica. T. kirilowii is a dioecious species like T. pilosa, and the other two are monoecious. We identified one-to-one orthologous groups (OGs) for 1,145 female-biased, 343 male-biased, and 2,378 unbiased genes. »

      Authors’ response: We have revised this paragraph to the following, “We compared rates of protein evolution among male-biased, female-biased and unbiased genes in four species with phylogenetic relationships (((T. anguina, T. pilosa), T. kirilowii), Luffa cylindrica), including dioecious T. pilosa, dioecious T. kirilowii, monoecious T. anguina in Trichosanthes, together with monoecious Luffa cylindrica. To do this, we sequenced transcriptomes of T. pilosa. We also collected transcriptomes of T. kirilowii, as well as genomes of T. anguina and Luffa cylindrica.”

      Line 220: « the same ω value was in all branches » -> « all branches are constrained to have the same ω value ».

      Authors’ response: We have revised it.

      Line 221: « results of the 'two-ratio' branch model ... »

      Authors’ response: We have revised it.

      Line 235: add a few words to explain why the effect size is bigger than for buds, but still is not significant: e.g. «possibly because of limited statistical power due to the low number of sex-biased genes in flowers at anthesis »

      Authors’ response: We have revised this to “However, there is no statistically significant difference in the distribution of ω values using Wilcoxon rank sum tests for female-biased versus male-biased genes (P = 0.0556), female-biased versus unbiased genes (P = 0.0796), and male-biased versus unbiased genes (P = 0.3296) possibly because of limited statistical power due to the low number of sex-biased genes in flowers at anthesis.” in line 260-261.

      Line 255: explain in plain English what the « A model » is. This was already requested in the previous version.

      Authors’ response: We have revised “A model” to “classical branch-site model A”.

      Line 258: explain in plain English what the « foreground 2b ω value » corresponds to

      Authors’ response: We have revised to as follows, “foreground 2b ω value” to “foreground ω >1”. Additionally, we also added the sentence “The classical branch-site model assumes four site classes (0, 1, 2a, 2b), with different ω values for the foreground and background branches. In site classes 2a and 2b, the foreground branch undergoes positive selection when there is ω > 1.” in line 624-627.

      Line 259: explain how these different approaches complement each other rather than being redundant. This was also already requested in the previous version.

      Authors’ response: Sorry. We have now revised it as follows, “As a complementary approach, we utilized the aBSREL and BUSTED methods that are implemented in HyPhy v.2.5 software, which avoids false positive results by classical branch-site models due to the presence of rate variation in background branches, and detected significant evidence of positive selection.” in line 292-295.

      Line 270: remove « dramatically », and also remove « or eliminated at both gene-wide and genomewide levels », as well as « relative to positive selection »

      Authors’ response: Thank you for your suggestions. We have revised it.

      Line 290-309: remove this section - this was already pointed out in the previous reviews as a « ad hoc » procedure, and this point has already been made clear with the RELAX analysis.

      Authors’ response: Thank you for your suggestions. We revised this section accordingly. We remove the following paragraph, “To confirm the contributions of positive selection and relaxed selection to rapid rates of male-biased genes in floral buds, we generated three datasets of OGs by excluding different sets of genes. Specifically, we excluded 18 relaxed selective male-biased genes (5.23%), 98 positively selected male-biased genes (28.57%), and 112 male-biased genes (32.65%) under positive and relaxed selection from 343 OGs (Fig. S4). We observed that after excluding malebiased genes under relaxed purifying selection, the median (0.264) decreased by 0.34% compared to the median (0.265) of all OGs (Fig. S4A-B). However, after excluding positively selected malebiased genes, the median (0.236) was reduced by 11% (Fig. S4A, C) in the results of ‘free-ratio’ branch model. This pattern was consistent with the results of ‘two-ratio’ branch model as well (Fig. S4E-G).” on line 334-344.

      However, we kept the other parts “We also analyzed female-biased and unbiased genes that underwent positive and relaxed selection in floral buds (Tables S6-S10). We identified 216 (18.86%) positively selected, and 69 (6.03%) relaxed selective female-biased genes from 1,145 OGs, respectively. Similarly, we found 436 (18.33%) positively selected, and 43 (1.81%) unbiased genes under relaxed selection from 2,378 OGs, respectively. Notably, male-biased genes have a higher proportion (10%) of positively selected genes compared to female-biased and unbiased genes. However, relaxed selective male-biased genes have a higher proportion (3.24%) than unbiased genes, but about 0.8% lower than that of female-biased genes.”. In this way, we can compare the proportion of sex-biased genes that have undergone positive selection and release selection among female-biased genes, unbiased genes and male-biased genes in floral buds in the Discussion sections.

      Line 348: Here you talk about « Numerous studies », but then only report three studies. Please clarify.

      Authors’ response: Thank you for your suggestions. We have revised it to “Several studies”.

      Line 352: Cut the sentence: « In contrast, the wind-pollinated dioecious plant Populus balsamifera ... »

      Authors’ response: Thank you for your suggestions. We have revised it.

      Line 357: « In contrast to the above studies... »: If I understand correctly, this is not in contrast to the observation in Populus balsamifera. Please clarify.

      Authors’ response: Thank you for your suggestions. We have revised to “Similar to the above study of Populus balsamifera.”.

      Line 420: « our results » -> « we »; « that underwent » -> « undergoing »

      Authors’ response: Thank you for your suggestions. We have revised it.

      Figure 3 is very hard to read and poorly labeled (see my comments on line 194 above). It is also hard to link to the text, since the numbers reported in the text are actually not present in the figure unless the readers makes some calculations themselves. This should be improved. Also, the use of acronyms (e.g. M1BG, F2TG etc.) contributes to making the text very difficult to read. The acronyms should at least be explained very clearly in the text when they are used.

      Authors’ response: Thank you for your suggestions. We have revised the text to clarify the meaning of the acronym (F1TGs, F2TGs, M1TGs, M2TGs, F1BGs, F2BGs, M1BGs and M2BGs) and give the number of genes. We have added two labels, indicating that panels A and B correspond to males and C and D to females in Figure 3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The apicoplast, a non-photosynthetic vestigial chloroplast, is a key metabolic organelle for the synthesis of certain lipids in apicomplexan parasites. Although it is clear metabolite exchange between the parasite cytosol and the apicoplast must occur, very few transporters associated with the apicoplast have been identified. The current study combines data from previous studies with new data from biotin proximity labeling to identify new apicoplast resident proteins including two putative monocarboxylate transporters termed MCT1 and MCT2. The authors conduct a thorough molecular phylogenetic analysis of the newly identified apicoplast proteins and they provide compelling evidence that MCT1 and MCT2 are necessary for normal growth and plaque formation in vitro along with maintenance of the apicoplast itself. They also provide indirect evidence for a possible need for these transporters in isoprenoid biosynthesis and fatty acid biosynthesis within the apicoplast. Finally, mouse infection experiments suggest that MCT1 and MCT2 are required for normal virulence, with MCT2 completely lacking at the administered dose. Overall, this study is generally of high quality, includes extensive quantitative data, and significantly advances the field by identifying several novel apicoplast proteins together with establishing a critical role for two putative transporters in the parasite. The study, however, could be further strengthened by addressing the following aspects:

      Response: We thank very much the reviewer for his/her positive evaluation of our work. To address the detailed function of the transporters, in the past three months, we have re-constructed plasmids (with codon-optimized DNA sequences of the genes) for expression of the transporters in a regular expression E. coli strain (BL21DE3) and in a pyruvate import knockout E. coli strain (a gift from Prof. Kirsten Jung), to examine the transport capability in vitro. And, we have also re-constructed a new plasmid containing a new leading peptide for targeting the pyruvate sensor PyronicSF to the apicoplast in the parasite, to probe the possible substrate pyruvate. However, we did not successfully observe expression of the transporters in the above E. coli strains, and we were unable to target the sensor to the correct localization (the apicoplast) in the parasite. As a result, all efforts have led the study to the current version of manuscript on the functional identification of transporters. We will keep working on this aspect, attempting to dissect out the exact transport function of the transporters in the future. In the current manuscript, we have discussed the limitations of our study in the last part of the manuscript.

      Main comments

      1) The conclusion that condition depletion of AMT1 and/or AMT2 affects apicoplast synthesis of IPP is only supported by indirect measurements (effects on host GFP uptake or trafficking, possibly due to effects on IPP dependent proteins such as rabs, and mitochondrial membrane potential, possibly due to effects on IPP dependent ubiquinone). This conclusion would be more strongly supported by directly measuring levels of IPP. If there are technical limitations that prevent direct measurement of IPP then the author should note such limitations and acknowledge in the discussion that the conclusion is based on indirect evidence.

      Response: We thank the reviewer very much for the suggestions. We have tried to establish the measurement of IPP using a commercial company in recent months, yet we have not been successful in making the assay work. Considering the problem of indirect evidence, we have discussed this limitation in the discussion.

      2) The conclusion that condition depletion of AMT1 and/or AMT2 affects apicoplast synthesis of fatty acids is also poorly supported by the data. The authors do not distinguish between the lower fatty acid levels being due to reduced synthesis of fatty acids, reduced salvage of host fatty acids, or both. Indeed, the authors provide evidence that parasite endocytosis of GFP is dependent on AMT1 and AMT2. Host GFP likely enters the parasite within a membrane bound vesicle derived from the PVM. The PVM is known to harbor host-derived lipids. Hence, it is possible that some of the decrease in fatty acid levels could be due to reduced lipid salvage from the host. Experiments should be conducted to measure the synthesis and salvage of fatty acids (e.g., by metabolic flux analysis), or the authors should acknowledge that both could be affected.

      Response: We thank the reviewer very much for comments and suggestions. We partially agree with the comments that the depletion of transporters could affect lipids scavenged from the host cells, as endocytic vesicles are indeed derived from the parasite plasma membrane at the micropore and potentially from the host cell endo-membrane system, as demonstrated with the micropore endocytosis in our previous study (pmid: 36813769). Our latest study has addressed this by showing that the endocytic trafficking of GFP vesicles is regulated by prenylation of proteins (e.g. Rab1B and YKT6.1), depletion of which resulted in diffusion of GFP vesicles, but not disappearance of GFP vesicles in the parasites (pmid: 37548452), indicating that the vesicles (containing lipids) enter the parasites. In the current manuscript, the percentage of parasites containing GFP foci was significantly reduced in AMT1/AMT2-depleted parasites, and instead, parasites containing GFP diffusion appeared and the percentage was almost equal to the reduced level of parasites with GFP foci. These results suggested that endocytic vesicles (e.g. GFP vesicles) were continuously generated by the micropore in the parasites depleted with AMT1/AMT2, and that the vesicle trafficking was regulated by proteins modified by IPP derivatives that were derived from the apicoplast. Based on these observations, we considered that lipids in endocytic vesicles should not contribute to the reduced level of fatty acids and other lipids in parasites depleted with AMT1/AMT2. We have added in a short discussion concerning the fatty acids and lipids reduced in the parasites.

      Reviewer #2 (Public Review):

      In this study Hui Dong et al. identified and characterized two transporters of the monocarboxylate family, which they called Apcimplexan monocarboxylate 1 and 2 (AMC1/2) that the authors suggest are involved in the trafficking of metabolites in the non-photosynthetic plastid (apicoplast) of Toxoplasma gondii (the parasitic agent of human toxoplasmosis) to maintain parasite survival. To do so they first identified novel apicoplast transporters by conducting proximity-dependent protein labeling (TurboID), using the sole known apicoplast transporter (TgAPT) as a bait. They chose two out of the three MFS transporters identified by their screen based and protein sequence similarity and confirmed apicoplast localisation. They generated inducible knock down parasite strains for both AMC1 and AMC2, and confirmed that both transporters are essential for parasite intracellular survival, replication, and for the proper activity of key apicoplast pathways requiring pyruvate as carbon sources (FASII and MEP/DOXP). Then they show that deletion of each protein induces a loss of the apicoplast, more marked for AMC2 and affects its morphology both at its four surrounding membranes level and accumulation of material in the apicoplast stroma. This study is very timely, as the apicoplast holds several important metabolic functions (FASII, IPP, LPA, Heme, Fe-S clusters...), which have been revealed and studied in depth but no further respective transporter have been identified thus far. hence, new studies that could reveal how the apicoplast can acquire and deliver all the key metabolites it deals with, will have strong impact for the parasitology community as well as for the plastid evolution communities. The current study is well initiated with appropriate approaches to identify two new putatively important apicoplast transporters, and showing how essential those are for parasite intracellular development and survival. However, in its current state, this is all the study provides at this point (i.e. essential apicoplast transporters disrupting apicoplast integrity, and indirectly its major functions, FASII and IPP, as any essential apicoplast protein disruption does). The study fails to deliver further message or function regarding AMC1 and 2, and thus validate their study. Currently, the manuscript just describes how AMC1/2 deletion impacts parasite survival without answering the key question about them: what do they transport? The authors yet have to perform key experiments that would reveal their metabolic function. I would thus recommend the authors work further and determine the function of AMC1 and 2.

      Response: We thank very much the reviewer for his/her positive evaluation of our work. To address the detailed function of the transporters, in the past three months, we have re-constructed plasmids (with codon-optimized DNA sequences of the genes) for expression of the transporters in a regular expression E. coli strain (BL21DE3) and in a pyruvate import knockout E. coli strain (a gift from Prof. Kirsten Jung), to examine the transport capability in vitro. And, we have re-constructed a new plasmid containing a new leading peptide for targeting the pyruvate sensor PyronicSF to the apicoplast in the parasite, to probe the possible substrate pyruvate. However, we were unable to successfully observe expression of the transporters in the above E. coli strains, and we were unable to target the sensor to the correct localization (the apicoplast) in the parasite. As a result, all these efforts have led the study to the current version of manuscript on the functional identification of transporters. We will keep working on this aspect, attempting to dissect out the exact transport function of the transporters in the near future. In this current manuscript, we have discussed the limitations of our study in the last part of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      Line 35: ...appears to have evolved...

      Line 67: remove first comma

      Line 105: thereafter or therefore?

      Line 130: define ACP

      Line 131: define TMD

      Response: We thank very much the reviewer for the suggestions, and we have revised the points in the current manuscript.

      Figure 1: more information on APT1 would be helpful for readers to interpret the results from turboID e.g., consider showing an illustration showing, according to Karnataki et al 2007 that APT1 likely occupies all 4 membranes of the apicoplast. Also, according to DeRocher et al 2012, APT1 N-term and C-term are both cytosolically exposed, at least in the outermost membrane. The orientation in the other membranes is not known.

      Response: We thank very much the reviewer for the suggestions. We analyzed the localization information of APT1 in T. gondii, based on the studies as the reviewer proposed (Karnataki, et al., 2007; DeRocher et al., 2012). The HA tag at the C-terminus of APT1 was distributed at the four membranes of the apicoplast, indicating that the topology of APT1 might be difficult to be defined at the membranes. Considering this information, we felt hesitant to clearly describe the topology in a schematic diagram about the protein APT1. Nevertheless, the TurboID tagging at the C-terminus of APT1 was an excellent model for identification of potential transporters localized at membranes of the apicoplast. We have put more information about the topology of APT1 in the manuscript, thus providing a better understanding of the proteomic results.

      Figure 2: add a space between "T." and "gondii"

      Figure 2: remove period between "Fitness" and "scores"

      Figure 2: different fonts are used within the figure. Consider using only one font such as arial. Same for Figure 4.

      Figure 2: "Fitness scores" is not bold in panel A but is bold in panel B.

      Response: We thank very much the reviewer for the suggestions. We have revised the points in the current version of the manuscript.

      Line 187: superscript -7

      Line 249: Caution should be used in interpreting two bands as being a precursor and mature product without additional experiments to establish such a relationship. Consider using the term "might" rather than "appear to". The presence of multiple bands could be due to phenomena other than proteolytic processing e.g., alternative splicing, alternative initiator codons, etc.

      Response: We thank very much the reviewer for the suggestions. We have revised the sentences in the current version of manuscript.

      Line 291: define IPP

      Figure 3E. The data points for KD strains appear to be positioned above the zero value on the y-axis. Is this correct?

      Response: We thank very much the reviewer for the suggestions. We have rechecked the figure and replaced it with the correct one.

      Figure 3 G/H legend. Please describe what a single data point represents e.g., the average of one field of view, the average of a certain number of fields of view, or something else? Are the data combined from three experiments or from a representative experiment?

      Response: We thank very much the reviewer for the suggestions. Three independent experiments were performed with at least three replicates. At least 150 vacuoles were scored in each replicate, thus resulting in at least 9 data points in total. The data points were shown with the results from each replicate.

      Line 325: define MEP and explain how it is connected to IPP

      Response: We thank very much the reviewer for the suggestions. We have provided the information in the current version of the manuscript.

      Lines 351-355: The authors refer to Figure 4D to support this statement, but presumably they mean 4E. Also, the authors use the terms C14, C16, and C18. They should more precisely use the terms myristic acid, palmitoleic acid, and trans_oleic acid if this is what they are referring to. Finally, the authors should determine if there is a statistically significant difference between levels of these fatty acids between AMT1 KD and AMT2 KD. If not, they should suggest there is an overall trend toward lower levels of these fatty acids in AMT2 KD parasites compared to AMT1 KD parasites.

      Response: We thank very much the reviewer for the suggestions. We have revised the information in the current version of the manuscript.

      Lines 363-364: The basis of this comment is unclear. Please clarify.

      Lines 369-370: the authors have not shown that the observed lower levels of fatty acids are due to synthesis, as noted above

      Response: We thank very much the reviewer for the suggestions. We have accordingly revised the information in the current version of the manuscript.

      Line 383: Should be Figure S6D

      Line 386: An entire section of the results is used to describe data that are entirely in a supplemental figure. Consider moving this data to a main figure.

      Response: We thank very much the reviewer for the suggestions. We have transferred the data to the main figure in the current version of the manuscript.

      Line 391: Consider using the term virulence instead of growth since now experiments were performed to specifically assess parasite growth in the infected mice.

      Response: We thank very much the reviewer for the suggestions. We have revised the terms in the Results section.

      Line 427: Perhaps the authors mean "...strong growth defect..." or ...strong growth impairment..."

      Line 460-461: This statement is unclear. Please explain how strong backgrounds in proteomics have made it difficult to identify apicoplast transporters. Because they are low abundance? Because they are membrane proteins?

      Response: We thank very much the reviewer for the suggestions. We have revised the corresponding sentences in the current version. The strong backgrounds in the proteomics resulted from the high activity and nonspecific labeling of biotin ligase fused with the apicoplast proteins.

      518-521: It would be helpful for non-specialists if the authors explained how pyruvate is connected to IPP biosynthesis.

      523: delete period after "Escherichia"

      548-549: "We observed similar decreases in level of the MEP biosynthesis activity upon depletion of AMT1 and AMT2..." Reword this since no experiments were done to measure MEP biosynthesis activity.

      Response: We thank very much the reviewer for the suggestions. We have accordingly revised the relevant sentences in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      • The metabolomic data on fatty acid synthesis and isoprenoid levels is relevant but cannot inform about the function of the transporter, since any protein causing loss of the apicoplast would behave in such a manner, i.e. block the apicoplast pathways.

      Response: We thank very much the reviewer for the comment. We agree with this comment. We have thus discussed these points in a subsection in the Discussion, pointing out some of the limitations in the study.

      • Currently, the manuscript fails to directly prove what AMC1 and AMC2 transports, potentially pyruvate as suggested to putatively fuel FASII and MEP/DOXP. Further experimental approaches using exogenous complementation and/or metabolomic analyses using stable isotope labelling (for example) should potentially bring light to the putative functions of AMC1/2.

      Response: We thank very much the reviewer for the comments. As described above, we attempted several approaches to find out the substrates that the AMT1 and AMT2 transports. However, we could not successfully express the proteins in E. coli strains, and we did not generate a T. gondii strain that a pyruvate sensor was properly targeted to the apicoplast. At the end of the Discussion, we have a subsection that discusses the limitations of this study. We hope that our future approaches will be able to tackle these difficulties on the substrate identification.

      Furthermore, the authors have not considered other pathways of interest, like heme or lysophosphatidic acid (LPA)n synthesis, which are two other key pathway, which may be related to AMC1/2 function. Those proposed experiments represent an important body of work, required to bring light to their metabolic functions.

      Response: We thank very much the reviewer for the comments. We thought about that, but we finally decided to mainly discuss two of the pathways that the transporters might participate in, since the transporters contain specific domains on the proteins sequences that potentially are associated with pyruvate.

      Further, the authors might have partially missed some referencing and data about the apicoplast in their introduction (and potentially to address other facets of the apicoplast metabolic functions/capacities in regards to AMC1/2 function): the introduction referencing and explanations are somehow not fully exact/precise for the part of the apicoplast and its pathway: references about the apicoplast, discovery and origin are not citing the original work (that should be Wilson et al. 1996, McFadden et al. 1996, Kohler et al. 1997,), same for the discovery of FASII and MEP./DOXP (Waller 1998, Jomaa et al...). The introduction (and the study?) lacks information about other key functions of the apicoplast: heme synthesis, lysophosphatidic acid synthesis (using FASII products). The explanations about the roles of FASII/DOXP are partial and not fully citing important references: Krishnan et al. 2020, and Amiar et al. 2020 are also key to understanding how the role of FASII is metabolically flexible depending on nutrient content. A whole part on the fact that FASII is not only dispensible but can also become essential under metabolic adaptations conditions, are missing (Botté et al. 2013, Amiar et al. 2020, Primo et al. 2021). These novel important facets of parasite biology should be mentioned as well as directly linked to the author's topic. This is more minor but could bring new ideas to the authors.

      Response: We thank very much the reviewer for the suggestions. We have revised the relevant part in the introduction.

      We are grateful for the suggestions to improve the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents a valuable conceptual advance of how Vitamin A and its derivatives contribute to atherosclerosis. There is solid evidence invoking the contributions of specialized populations of T cells in atherosclerosis resolution, including use of multiple in vivo models to validate the functional effect. The significance of the study would be strengthened with more detailed interrogation of lesions composition and consolidation with previous work on the topic from human studies.

      Answer: We thank the reviewers and editorial office for their comments and constructive criticism. Below we provide point by point responses to the comments and concerns, which include the issues of lesion composition and consolidation with human studies. We also proofread the manuscript and included information about the immunostaining procedures that were previously missing (Lines 199 – 206).

      Public Reviews

      REVIEWER #1:

      This is an interesting study by Pinos and colleagues that examines the effect of beta carotene on atherosclerosis regression. The authors have previously shown that beta carotene reduces atherosclerosis progress and hepatic lipid metabolism, and now they seek to extend these findings by feeding mice a diet with excess beta carotene in a model of atherosclerosis regression (LDLR antisense oligo plus Western diet followed by LDLR sense oligo and chow diet). They show some metrics of lesion regression are increased upon beta carotene feeding (collagen content) while others remain equal to normal chow diet (macrophage content and lesion size). These effects are lost when beta carotene oxidase (BCO) is deleted. The study adds to the existing literature that beta carotene protects from atherosclerosis in general, and adds new information regarding regulatory T-cells. However, the study does not present significant evidence about how beta-carotene is affecting T-cells in atherosclerosis. For the most part, the conclusions are supported by the data presented, and the work is completed in multiple models, supporting its robustness. However there are a few areas that require additional information or evidence to support their conclusions and/or to align with the previously published work.

      Specific additional areas of focus for the authors:

      1. The premise of the story is that b-carotene is converted into retinoic acid, which acts as a ligand of the RAR transcription factor in T-regs. The authors measure hepatic markers of retinoic acid signaling (retinyl esters, Cyp26a1 expression) but none of these are measured in the lesion, which calls into question the conclusion that Tregs in the lesion are responsible for the regression observed with b-carotene supplementation.

      Answer: We agree with the Reviewer’s comment, which prompted us to quantify the expression of the retinoic acid-sensitive maker Cyp26b1 in the atherosclerotic lesions. Cyp26b1, together with Cyp26a1 and c1, contain retinoic acid response elements (RAREs) in their promoter, and therefore, are highly sensitive to retinoic acid. Indeed, the mRNA/protein expression of Cyp26s are widely considered surrogate markers for retinoic acid levels in cells or tissues.

      We typically use Cyp26a1 as a surrogate marker for retinoic acid signaling in the adipose tissue and the liver, as we did in this study. However, our RNA seq data in murine bone-marrow derived macrophages (mBMDMs) exposed to retinoic acid revealed that Cyp26b1 is the only Cyp26 family member responsive to retinoic acid (PMID: 36754230). Actually, Cyp26a1 or c1 were not expressed in our mBMDMs (data not shown). Unlike the M2 marker arginase 1, Cyp26b1 did not respond to IL-4 (Figure iA). Hence, Cyp26b1 is an adequate marker to evaluate retinoic acid signaling in the lesion of mice, rich in macrophages.

      Before staining the lesions, we validated the Cyp26b1 antibody by staining mBMDMs exposed to retinoic acid (Figure iB).

      Author response image 1.

      (A) mBMDMs were divided in M0 or M2 (exposed to IL-4 for 24 h), and then treated with either DMSO or retinoic acid for 6 h before harvesting for RNA seq analysis. Exploring the RNA seq dataset, we identified Cyp26b1 as a RA-sensitive gene in mBMDMs (PMID: 36754230). (B) Validation of Cyp26b1 antibody in mBMDMs exposed to retinoic acid confirms the suitability of this antibody for measuring retinoic acid signaling in our experimental settings.

      In the current version of the manuscript, we include the results of Cyp26b1 quantifications (Figure 5H, I), (Lines: 362 - 366). To put these findings in perspective to human studies, we discuss these results with the role human CYP26B1 plays in the atherosclerotic lesion (Lines: 450 - 464).

      1. There does not appear to be a strong effect of Tregs on the b-carotene induced pro-regression phenotype presented in Figure 5. The only major CD25+ cell dependent b-carotene effect is on collagen content, which matches with the findings in Figure 1 +2. This mechanistically might be very interesting and novel, yet the authors do not investigate this further or add any additional detail regarding this observation. This would greatly strengthen the study and the novelty of the findings overall as it relates to b-carotene and atherosclerosis.

      Answer: As the Reviewer points out, the effects of β-carotene on collagen content are more pronounced than those on CD68 content in the lesion. Indeed, we have observed the majority of the experiments in this manuscript.

      Collagen accumulation in the lesion is a complex process, where smooth muscle cells secrete collagen and plaque macrophages (typically) degrade it. Matrix metalloproteases produced by macrophages contribute to the degradation of collagen, and studies show that retinoic acid regulates the expression of metalloproteinases in various cell types (PMID: 2324527, 24008270). We explored the expression of metalloproteases in macrophages exposed to retinoic acid in our mBMDM RNA seq, but we did not observe any significant result (data not shown).

      Interestingly, M2 macrophages can secrete collagen by upregulating arginase 1 expression. In the current version of the manuscript, we acknowledge this in the results (Lines: 358-359) and in the discussion section (Lines: 443-449).

      1. The title indicates that beta-carotene induces Treg 'expansion' in the lesion, but this is not measured in the study.

      Answer: Following the suggestion by the Reviewer, we have re-worded the title to “β-carotene accelerates the resolution of atherosclerosis in mice”

      REVIEWER #2:

      Pinos et al present five atherosclerosis studies in mice to investigate the impact of dietary supplementation with b-carotene on plaque remodeling during resolution. The authors use either LDLR-ko mice or WT mice injected with ASO-LDLR to establish diet-induced hyperlipidemia and promote atherogenesis during 16 weeks, and then they promote resolution by switching the mice for 3 weeks to a regular chow, either deficient or supplemented with b-carotene. Supplementation was successful, as measured by hepatic accumulation of retinyl esters. As expected, chow diet led to reduced hyperlipidemia, and plaque remodeling (both reduced CD68+ macs and increased collagen contents) without actual changes in plaque size. But, b-carotene supplementation resulted in further increased collagen contents and, importantly, a large increase in plaque regulatory T-cells (TREG). This accumulation of TREG is specific to the plaque, as it was not observed in blood or spleen. The authors propose that the anti-inflammatory properties of these TREG explain the atheroprotective effect of b-carotene, and found that treatment with anti-CD25 antibodies (to induce systemic depletion of TREG) prevents b-carotene-stimulated increase in plaque collagen and TREG.

      1. An obvious strength is the use of two different mouse models of atherogenesis, as well as genetic and interventional approaches. The analyses of aortic root plaque size and contents are rigorous and included both male and female mice (although the data was not segregated by sex). Unfortunately, the authors did not provide data on lesions in en face preparations of the whole aorta.

      Answer: We appreciate the positive comments on rigor. We considered displaying our data segregated by sex, although for some experiments, we did not have matching numbers of male and female mice, which could be distracting for the reader. The goal of our study was to analyze changes in plaque composition. Therefore, our experimental approach was designed to study atherosclerosis resolution (plaque composition changes, but not plaque size) instead of atherosclerosis regression (both plaque composition and size change). As expected, we did not observe differences in plaque size at the level of the atherosclerotic root for any of our experiments, which deterred us from quantifying plaque content by en-face in the aorta.

      2.Overall, the conclusion that dietary supplementation with b-carotene may be atheroprotective via induction of TREG is reasonably supported by the evidence presented. Other conclusions put forth by the authors (e.g., that vitamin A production favors TREG production or that BCO1 deficiency reduces plasma cholesterol), however, will need further experimental evidence to be substantiated.

      Answer: We apologize for the lack of clarity in the presentation of our results and overstating our conclusions. We have rephrased some of these conclusions in the results and discussion sections.

      3.The authors claim that b-carotene reduces blood cholesterol, but data shown herein show no differences in plasma lipids between mice fed b-carotene-deficient and -supplemented diets (Figs. 1B, 2A, and S3A).

      Answer: As Reviewer 2 points out, we did not observe changes in plasma cholesterol between mice undergoing Resolution in response to β-carotene. For clarity, we rephrased our plasma lipids results for each of our experimental designs (Lines: 230 – 236, 270 – 272, and 288-290). We also include a clarification in the discussion section about the differential effects of β-carotene on plasma lipids when mice undergo atherosclerosis progression and resolution. (Lines: 419 - 430).

      1. Also, the authors present no experimental data to support the idea that BCO1 activity favors plaque TREG expansion (e.g., no TREG data in Fig 3 using Bco1-ko mice).

      Answer: We appreciate the suggestion by the Reviewer 2. In the current version of the manuscript, we stained the aortic roots from Bco1-/- mice for FoxP3. We did not observe differences between Control and β-carotene resolution groups, in agreement with the results in plaque composition (CD68 and collagen contents). These new data strengthen our manuscript and now we included these results as a Supplementary Figure 3D, E. (Lines: 465 - 471).

      5.As the authors show, the treatment with anti-CD25 resulted in only partial suppression of TREG levels. Because CD25 is also expressed in some subpopulation of effector T-cells, this could potentially cloud the interpretation of the results. Data in Fig 4H showing loss of b-carotene-stimulated increase in numbers of FoxP3+GFP+ cells in the plaque should be taken cautiously, as they come from a small number of mice. Perhaps an orthogonal approach using FoxP3-DTR mice could have produced a more robust loss of TREG and further confirmation that the loss of plaque remodeling is indeed due to loss of TREG.

      Answer: We agree with the reviewer, and we rephrased the results and discussion to avoid overstating our findings. We now acknowledge a second experimental approach would help us confirm our findings employing a blocking antibody targeting CD25. We favored the use of anti-CD25 infusions over other depletion methods based on the experimental protocol carried out by our collaborators in which the examined the effect of Tregs on atherosclerosis regression (PMID: 32336197). The utilization of FoxP3-DTR mice would nicely complement our findings. In the current version of the manuscript, we discuss this alternative approach (Line : 491 - 501).

      Recommendations for the Authors

      All reviewers agreed that despite the claims of the title, there is no direct interrogation of Tregs or vitamin A signaling in lesions.

      The work does not consolidate well with the role of B-carotene in human heart disease. Additional discussion and synthesis are required to elaborate on the significance of the findings. For example, the idea of beta carotene supplementation for cardiovascular prevention has attracted attention for years but recent meta-analysis showed no benefit, and, if anything, an increase in cardiovascular events. The U.S. Preventive Services Task Force (USPSTF) went as far to recommend AGAINST the use of beta-carotene for the prevention of cardiovascular disease.

      In light of the above point and elife editorial policies, please revise the title to include species.

      Answer: Thanks for your feedback. Carotenoid metabolism in mammals is complex, and establishing direct parallelisms between humans and rodents must be done with caution. For example, β-carotene supplementation in humans inevitably results in the accumulation of this compound in plasma, while in rodents, β-carotene is quickly metabolized to vitamin A. Our findings over the years reveal that the effects of β-carotene in mice derive exclusively from its role as vitamin A precursor.

      In the current study, we confirm our previous work utilizing Bco1-/- mice, which are unable to produce vitamin A when fed β-carotene. Then, we observe that vitamin A promotes atherosclerosis resolution in mice independently of alterations in plasma cholesterol in two independent mouse models. Lastly, we utilized anti-CD25 blocking antibodies to deplete Tregs to establish a direct connection between dietary β-carotene/vitamin A and Tregs in the lesion. While this experimental approach failed to completely deplete Tregs, our morphometric assays indicates that these infusions were sufficient to partially mitigate the effect of β-carotene on atherosclerosis resolution.

      Regardless, in the discussion section of our manuscript, we attempt to consolidate our preclinical studies with clinical data (Lines: 374 – 376, and 461 – 464).

      We have also revised the title, as suggested by Reviewer 1. We also included “mice” in the title to align with the editorial policies of eLife.

      Reviewer #1:

      1.1. The authors need to measure retinoic acid signaling directly in the lesion and in Tregs to be able to draw the conclusion that b-carotene is directly activating Tregs to promote regression.

      Answer: Please see comments above.

      1.2. The authors to investigate the role of beta carotene on collagen production by T-regs.

      Answer: Please see comments above.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      2.1. If the authors still have frozen sections of the aortas from their Bco1-ko experiment, it should be trivial to look at plaque TREG contents to confirm that vitamin A production is indeed needed for the effect of b-carotene on plaque remodeling.

      Answer: Please see comments above.

      Minor:

      2.2. This reviewer wonders if the axis for lesion size in all figures is off by an order of magnitude. Most studies show aortic root lesions in the 10^5 um2 range, not in the 10^6 um2.

      Answer: We apologize for this error. We have corrected the units in all our quantifications.

      2.3. FPLC lipoprotein profiles would enhance the manuscript.

      Answer: We have run FPLCs for the plasmas and included them in the results (Lines: 233 – 236). Data are presented in Figure 1C, D.

      2.4.This reviewer could not cope with the thought that mice that are fed 16+ weeks a diet that is vitamin A-deficient did not become vit A-deficient (e.g., Fig. 1E). Perhaps the authors could elaborate a little on this in their discussion.

      Answer: Mice are extremely resistant to vitamin A deficiency. A common protocol to achieve deficiency in mice requires feeding a vitamin A deficient diet to dams during their pregnancy and lactation to deplete new-born pups of vitamin A stores. Even in that situation, pups display enough vitamin A stores to sustain circulating vitamin A levels to those observed in wild-type mice. In the current version of the manuscript, we have included a paragraph in the discussion to cover this “interesting” aspect. (Lines: 476 – 483).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The evolution of transporter specificity is currently unclear. Did solute carrier systems evolve independently in response to a cellular need to transport a specific metabolite in combination with a specific ion or counter metabolite, or did they evolve specificity from an ancestral protein that could transport and counter-transport most metabolites? The present study addresses this question by applying selective pressure to Saccharomyces cerevisiae and studying the mutational landscape of two well-characterised amino acid transporters. The data suggest that AA transporters likely evolved from an ancestral transporter and then specific sub-families evolved specificity depending on specific evolutionary pressure.

      Strengths:

      The work is based on sound logic and the experimental methodology is well thought through. The data appear accurate, and where ambiguity is observed (as in the case of citruline uptake by AGP1), in vitro transport assays are carried out to verify transport function.

      Weaknesses:

      Although the data and findings are well described, the study lacked additional contextual information that would support a clear take-home message.

      We appreciate the reviewer’s positive assessment of the work, and the helpful comment to summarize the findings into a short take-home message. We chose not to discuss protein evolution theories in detail to keep the text as concise as possible. However, we do acknowledge the fact that the reader might want to see our results embedded in more context. In a revised version, we will integrate our findings more with the pertinent literature, which will show how our results align with theoretical models for protein evolution towards novel functions. We will also discuss in more detail how our laboratory results could be translated into a “natural” setting of evolution.

      Reviewer #2 (Public Review):

      Summary:

      This paper describes evolution experiments performed on yeast amino acid transporters aiming at the enlargement of the substrate range of these proteins. Yeast cells lacking 10 endogenous amino acid transporters and thus being strongly impaired to feed on amino acids were again complemented with amino acid transporters from yeast and grown on media with amino acids as the sole nitrogen source.

      In the first set of experiments, complementation was done with seven different yeast amino acid transporters, followed by measuring growth rates. Despite most of them have been described before in other experimental contexts, the authors could show that many of them have a broader substrate range than initially thought.

      Moving to the evolution experiments, the authors used the OrthoRep system to perform random mutagenesis of the transporter gene while it is actively expressed in yeast. The evolution experiments were conducted such that the medium would allow for poor/slow growth of cells expressing the wt transporters, but much better/faster growth if the amino acid transporter would mutate to efficiently take up a poorly transported (as in the case of citrulline and AGP1) or non-transported (as in case of Asp/Glu and PUT4) amino acid.

      This way and using Sanger sequencing of plasmids isolated from faster-growing clones, the authors identified a number of mutations that were repeatedly present in biological replicates. When these mutations were re-introduced into the transporter using site-directed mutagenesis, faster growth on the said amino acids was confirmed. Growth phenotype data were attempted to be confirmed by uptake experiments using radioactive amino acids; however, the radioactive uptake data and growth-dependent analyses do not fully match, hinting at the existence of further parameters than only amino acid uptake alone to impact the growth rates.

      When mapped to Alphafold prediction models on the transporters, the mutations mapped to the substrate permeation site, which suggests that the changes allow for more favourable molecular interactions with the newly transported amino acids.

      Finally, the authors compared the growth rates of the evolved transporter variants with those of the wt transporter and found that some variants exhibit a somewhat diminished capacity to transport its original range of amino acids, while other variants were as fit as the wt transporter in terms of uptake of its original range of amino acids.

      Based on these findings, the authors conclude that transporters can evolve novel substrates through generalist intermediates, either by increasing a weak activity or by establishing a new one.

      Strengths:

      The study provides evidence in favour of an evolutionary model, wherein a transporter can "learn" to translocate novel substrates without "forgetting" what it used to transport before. This evolutionary concept has been proposed for enzymes before, and this study shows that it also can be applied to transporters. The concept behind the study is easy to understand, i.e. improving growth by uptake of more amino acids as nitrogen source. In addition, the study contains a large and extensive characterization of the transporter variants, including growth assays and radioactive uptake measurements.

      Weaknesses:

      The authors took a genetic gain-of-function approach based on random mutagenesis of the transporter. While this has worked out for two transporters/substrate combinations, I wonder how comprehensive and general the insights are. In such approaches, it is difficult to know which mutation space is finally covered/tested. And information that can be gained from loss-of-function analyses is missed. The entire conclusions are grounded on a handful of variants analyzed. Accordingly, the outcome is somewhat anecdotal; in some cases, the fitness of the variants was changed and in others not. Highlighting the amino acid changes in the context of the structural models is interesting, but does not fully explain why the variants exhibit changed substrate ranges. Two important technical elements have not been studied in detail by the authors, but may well play a certain role in the interpretation of the results. Firstly, the authors did not quantify the amount of transporter being present on the cell surface; altered surface expression can impact uptake rates and thus growth rates. Secondly, the authors have not assessed whether overexpressing wt versus variant transporters has an impact on the growth rate per se. Overexpressing transporters from plasmids is quite a burden for the cells and often impacts growth rates. Variants may be more or less of a burden, an effect that may (or may also not) go hand in hand with increased/decreased surface production levels.

      And finally, I was somewhat missing an evolutionary analysis of these transporters to gain insights into whether the identified substitutions also occurred during natural evolution under real-life conditions.

      First of all, we thank the reviewer for the attention to detail with which they have read the manuscript, and the very helpful comments on how to improve it. We will indeed take on some of the suggestions in a revised version of the text:

      Regarding the match of growth rate and uptake rate measurements, we plan to plot their correlation in a graph.

      Regarding the amount of transporter on the plasma membrane, we acknowledge that the visual representation of the fluorescence micrographs already in the text might not be enough. We therefore will quantify expression levels from said micrographs and include the information in the manuscript.

      On a similar note, we had already measured the growth rates of all transporter variant cultures in the absence of selection for amino acid uptake (i.e., in medium with ammonium as the nitrogen source; Figure 4 - Supplement figure 1). We will include the measured growth rates in the text to give an indication of what the impact of transporter overexpression is on the growth rate per se.

      Regarding the proposed analysis of natural transporter sequences, we do see the possible value in such an analysis. However, it is currently out of scope for the present study. The reasons are 1) that preliminary analyses show that the sequence similarity of functionally verified/annotated transporters is too low to reliably pinpoint a phenotype to a single residue, and 2) that we do not envision that the variants that we discovered are necessarily beneficial in a natural setting, where fine-grained regulation of amino acid transport may be more important than a broad substrate range. Regarding the generality of the insights, we do agree on the reviewer’s comment that we “only” analyzed a relatively small number of variants. However, the target of the study was not to generate high-throughput data on a large set of variants (e.g., by NGS of the whole culture) but to provide in-depth data for characterized and verified variants in a clean genetic background (i.e., verified phenotype and fitness measurements on all native and novel substrates).

      As to the mutation space, we will include an estimate in a revised version of the text. We estimate that a majority of all possible single mutants is covered in the first and second passages of the selection experiment, which is corroborated by the fact that we repeatedly find the same mutants in biological replicates.

      Regarding the mentioned loss-of-function analyses, we are unsure about what the reviewer intends with this statement at this point. To briefly summarize, we feel that our results are a good indication that transporters can evolve new functions analogously to enzymes. We explicitly do not imply that this is the only way to evolve novelty.

      Reviewer #3 (Public Review):

      The goal of the current manuscript is to investigate how changes in transporter substrate specificity emerge through experimental evolution. The authors investigate the APC family of amino acid transporters, a large family with many related transporters that together cover the spectrum of amino acid uptake in yeast.

      The authors use a clever approach for their experimental evolutions. By deleting 10 amino acid uptake transporters in yeast, they develop a strain that relies on amino acid import by introducing APC transporters under nitrogen-limiting conditions. They can thus evolve transporters towards the transport of new substrates if no other nitrogen source is available. The main takeaway from the paper is that it is relatively easy for the spectrum of substrates in a particular transporter of this family to shift, as a number of single mutants are identified that modulate substrate specificity. In general, transporters evolved towards gain-of-function mutations (better or new activities) and also confer transport promiscuity, expanding the range of amino acids transported.

      The data in the paper support the conclusions, in general, and the outcomes (evolution towards promiscuity) agree with the literature available for soluble enzymes. However, it is also a possibility that the design of these experiments selects for promiscuity among amino acids. The selections were designed such that yeast had access to amino acids that were already transported, with a greater abundance of the amino acid that was the target of selection. Under these conditions, it seems probable that the fittest variants will provide the yeast access to all amino acid substrates in the media, and unlikely that a specificity swap would occur, limiting the yeast to only the new amino acid.

      The authors also examine the fitness costs of mutants, but only in the narrow context of growth on a single (original) amino acid under conditions of nitrogen limitation. Amino acid uptake is typically tightly controlled because some amino acids (or their carbon degradation products) are toxic in excess. This paper does not address or discuss whether there might be a fitness cost to promiscuous mutants in conditions where nitrogen is not limiting.

      We are grateful for the reviewer’s insightful comments on the paper.

      Regarding the design of our experiments, we followed the concept of directed evolution as described by pioneers of the field, in which the starting point for evolving a protein is to have a basic level of that activity. In the case of AGP1, the promiscuous activity is Cit uptake. We recognize that elimination of all the already transported amino acids from the evolution media could also yield very insightful results. However, we aimed to simulate the effect of the evolutionary pressure acting in a “natural” environment, where the uptake of the specific amino acid is not initially crucial for its survival. In the case of PUT4, the experimental design was chosen to ensure the initial survival of the culture (since neither Glu nor Asp support the growth of the strain) by providing a low level of already transported amino acids. In the revised manuscript, we will state this more clearly.

      Regarding the second point, we agree that a short discussion about the potentially detrimental effects of promiscuous transporters would be beneficial for the reader. We will touch on this aspect in the revised version of the text. Indeed, our system is intentionally simplified, as we try to take regulation of transport out of the equation (e.g., by using the constitutive ADH1 promoter as opposed to a nitrogen-regulated one). In a natural setting, microorganisms encounter fluctuations of nutrient availability, necessitating tight control of nutrient transport. This is probably a major reason why microorganisms typically encode transporters with redundant specificities (i.e., promiscuous and specific ones). Otherwise, one very broad-range nutrient transporter would suffice. In our system, we artificially select for broad-range transport, which is reflected in the observed phenotypes of the evolved transporters. We expect that in a natural setting, a broad-range transporter would be a stepping stone to evolve a narrow-range transporter with a new specificity (which is actually what we see in the double-mutant AGP1-NV, with lowered fitness in original substrates and increased fitness in Cit).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our understanding of the ways in which different types of communication signals differentially affect mouse behaviors and amygdala cholinergic/dopaminergic neuromodulation. Researchers interested in the complex interaction between prior experience, sex, behavior, hormonal status, and neuromodulation should benefit from this study. Nevertheless, the data analysis is incomplete at this stage, requiring additional analysis and description, justification, and - potentially - power to support the conclusions fully. With the analytical part strengthened, this paper will be of interest to neuroscientists and ethologists.

      GENERAL COMMENTS ON REVIEWS AND REVISIONS

      Experimental design

      Here we address questions from several reviewers regarding our periods of neuromodulator and behavioral analysis. First, we recognize that the text would benefit from an overview of the experimental structure different from the narrative we provide in the first paragraphs of the Results. We now include this near the beginning for the Materials and Methods (page 17). We further articulate that the 10-minute time periods were dictated by the sampling duration required to perform accurate neurochemical analyses (and to reserve half of the sample in the event of a catastrophic failure of batch-processing samples). Since neurochemical release may display multiple temporal components (e.g., ACh: Aitta-aho et al., 2018) during playback stimulation, and since these could differ across neurochemicals of interest, we decided to collect, analyze, and report in two stimulus periods as well as one Pre-Stim control. We now clarify this in additional text in the Material and Methods (p. 24, lines 20-22; p. 26, lines 17-19). We decided not to include analyses of the post-stimulus period because this is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback.

      We also sought to clarify the meaning of the periods “Stim 1” and “Stim 2”; they are two data collection periods, using the same examplar sequences in the same order. We have added statements in the Material and Methods (p. 18, lines 4-7; Fig. caption, p. 39, lines 11-13) to clarify these periods.

      For behavioral analyses, observation periods were much shorter than 10 mins, but the main purpose of behavioral analyses in this report is to relate to the neurochemical data. As a result, we matched the temporal features of the behavioral and neurochemical analyses (p. 22, lines 17-22). We plan a separate report, focused exclusively on a broader set of behavioral responses to playback, that may examine behaviors at a more granular level.

      Data and statistical analyses

      Reviewers 1 and 3 expressed concerns about our normalization of neurochemical data, suggesting that it diminishes statistical power or is not transparent. We note that normalization is a very common form of data transformation that does not diminish statistical power. It is particularly useful for data forms in which the absolute value of the measurement across experiments may be uninformative. Normalization is routine in microdialysis studies, because data can be affected by probe placement and factors affecting neurochemical recovery and processing. Recent examples include:

      Li, Chaoqun, Tianping Sun, Yimu Zhang, Yan Gao, Zhou Sun, Wei Li, Heping Cheng, Yu Gu, and Nashat Abumaria. "A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice." Neuron (2023).

      Gálvez-Márquez, Donovan K., Mildred Salgado-Ménez, Perla Moreno-Castilla, Luis Rodríguez-Durán, Martha L. Escobar, Fatuel Tecuapetla, and Federico Bermudez-Rattoni. "Spatial contextual recognition memory updating is modulated by dopamine release in the dorsal hippocampus from the locus coeruleus." Proceedings of the National Academy of Sciences 119, no. 49 (2022): e2208254119.

      Holly, Elizabeth N., Christopher O. Boyson, Sandra Montagud-Romero, Dirson J. Stein, Kyle L. Gobrogge, Joseph F. DeBold, and Klaus A. Miczek. "Episodic social stress-escalated cocaine self-administration: role of phasic and tonic corticotropin releasing factor in the anterior and posterior ventral tegmental area." Journal of Neuroscience 36, no. 14 (2016): 4093-4105.

      Bagley, Elena E., Jennifer Hacker, Vladimir I. Chefer, Christophe Mallet, Gavan P. McNally, Billy CH Chieng, Julie Perroud, Toni S. Shippenberg, and MacDonald J. Christie. "Drug-induced GABA transporter currents enhance GABA release to induce opioid withdrawal behaviors." Nature neuroscience 14, no. 12 (2011): 1548-1554.

      However, since all reviewers requested raw values of neurochemicals, we provide these in supplementary tables 1-3. The manuscript references these table early in the Results (p. 6, lines 18-19) and in the Material and Methods (p. 27, lines 3-4)

      All reviewers commented on correlation analyses that we presented, with different perspectives. Reviewer 2 questioned the validity of such analyses, performed across experimental groups, while Reviewer 1 pointed out that the analyses were redundant with the GLM. We agree with these criticisms, and note the challenges associated with correlations involving behaviors for which there is a “floor” in the number of observations. As a result, we have removed most correlation analyses from the manuscript. The text and figures have been modified accordingly. Due these changes, we have to decline requests of Reviewer 3 to include many more such analyses. While correlation analyses could still be performed between neurochemicals and behaviors for each group, the relatively small size of each experimental group, the large number of groups, and the even larger numbers of pairings between neurochemicals and behavior, the statistical power is very low. The only correlations we utilize in the manuscript concern the interpretation of our increased acetylcholine levels.

      As part of this revision, we re-ran our statistical analyses on neuromodulators because of a calculation error in 3 animals (regarding baseline values). In a few instances, a significance level changed, but none of these changed a conclusion regarding neuromodulator changes under our experimental conditions.

      Other revisions

      INTRODUCTION: We modified the Introduction to provide both a more general framework and specific gaps in our understanding relating neuromodulators with vocal communication.

      DISCUSSION: We have added material in the first two pages of the Discussion to provide more framework to our conclusions, to address the issues of the temporal aspects of neurochemical release and behavioral observations, and to identify limitations that should be addressed in future studies.

      FIGURES: All figures are now in the main part of the manuscript. We modified most figures in response to reviewer comments. We removed neuromodulator – behavior correlations from several figures. We modified all box plots to ensure that all data points are visible. The visible data points match the numbers reported in figure captions. We brought 5-HIAA data into the main figures reporting on neuromodulator results.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript addresses a fundamental question about how different types of communication signals differentially affect brain states and neurochemistry. In addition, the manuscript highlights the various processes that modulate brain responses to communication signals, including prior experience, sex, and hormonal status. Overall, the manuscript is well-written and the research is appropriately contextualized. The authors are thoughtful about their quantitative approaches and interpretations of the data.

      That being said, the authors need to work on justifying some of their analytical approaches (e.g., normalization of neurochemical data, dividing the experimental period into two periods (as opposed to just analyzing the entire experimental period as a whole)) and should provide a greater discussion of how their data also demonstrate dissociations between neurochemical release in the basolateral amygdala and behavior (e.g., neurochemical differences during both of the experimental periods but behavioral differences only during the first half of the experimental period). The normalization of neurochemical data seems unnecessary given the repeated-measures design of their analysis and could be problematic; by normalizing all data to the baseline data (p. 24), one artificially creates a baseline period with minimal variation (all are "0"; Figures 2, 3 & 5) that could inflate statistical power.

      Please see our general responses to structure of observation periods and normalization of neuromodulator data. Normalization is a common and appropriate procedure in microdialysis studies that does not alter statistical power.

      We have included a section in the Discussion concerning the temporal relationship between behavioral responses and neurochemical changes in response to vocal playback (p. 12, lines 3-17). We note where the linkage is particularly strong (e.g., ACh release and flinching). This points to a need to examine these phenomena with finer temporal resolution, but also with the recognition that the brain circuits driving a behavioral response may extend beyond the BLA.

      The Introduction could benefit from a priori predictions about the differential release of specific neuromodulators based on previous literature.

      We added some material to the Introduction to provide additional rationale for the study. However, we did not attempt to develop predictions for the range of neuromodulators that we sought to test. The literature can lead to opposite predictions for a given neuromodulator. For example, acetylcholine could be associated with both positive and negative valence. Instead, we note in the Introduction the association of both DA and ACh with vocalizations.

      The manuscript would also benefit from a description of space use and locomotion in response to different valence vocalizations.

      We have provided additional descriptions of space use and video tracking data in Material and Methods (p. 23, lines 1-6). We now report a few correlations based on these data in the Results to demonstrate that increased ACh in Restraint males and Mating estrus females was not related to the amount of locomotion (p. 9, lines 8-14).

      Nevertheless, the current manuscript seems to provide some compelling support for how positive and negative valence vocalizations differentially affect behavior and the release of acetylcholine and dopamine in the basolateral amygdala. The research is relevant to broad fields of neuroscience and has implications for the neural circuits underlying social behavior.

      Reviewer #2 (Public Review):

      Ghasemahmad et al. report findings on the influence of salient vocalization playback, sex, and previous experience, on mice behaviors, and on cholinergic and dopaminergic neuromodulation within the basolateral amygdala (BLA). Specifically, the authors played back mice vocalizations recorded during two behaviors of opposite valence (mating and restraint) and measured the behaviors and release of acetylcholine (ACh), dopamine (DA), and serotonin in the BLA triggered in response to those sounds.

      Strength: The authors identified that mating and restraint sounds have a differential impact on cholinergic and dopaminergic release. In male mice, these two distinct vocalizations exert an opposite effect on the release of ACh and DA. Mating sounds elicited a decrease of Ach release and an increase of DA release. Conversely, restraint sounds induced an increase in ACh release and a trend to decrease in DA. These neurotransmission changes were different in estrus females for whom the mating vocalization resulted in an increase of both DA and ACh release.

      Weaknesses: The behavioral analysis and results remain elusive, and although addressing interesting questions, the study contains major flaws, and the interpretations are overstating the findings.

      Although Reviewer 2 raises several valid issues that we have addressed in our response and revision, we believe that none represent “major flaws” in the study that challenge the validity of our central conclusions. In brief, we will:

      --provide enhanced description of behaviors (pp. 22-23 and Table 1)

      --clarify / modify box-plot representations of data (p 28. Lines 3-9)

      --point to our methods that describe corrections for multiple comparisons (p. 27; lines 15-16)

      --revise figures to clarify sample size (Figs. 3-6)

      Reviewer #3 (Public Review):

      Ghasemahmad et al. examined behavioral and neurochemical responses of male and female mice to vocalizations associated with mating and restraint. The authors made two significant and exciting discoveries. They revealed that the affective content of vocalizations modulated both behavioral responses and the release of acetylcholine (ACh) and dopamine (DA) but not serotonin (5-HIAA) in the basolateral amygdala (BLA) of male and female mice. Moreover, the results show sex-based differences in behavioral responses to vocalizations associated with mating. The authors conclude that behavior and neurochemical responses in male and female mice are experience-dependent and are altered by vocalizations associated with restraint and mating. The findings suggest that ACh and DA release may shape behavioral responses to context-dependent vocalizations. The study has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the BLA while an animal listens to social vocalizations; however, multiple concerns must be addressed to substantiate their conclusions.

      Major concerns:

      1) The authors normalized all neurochemical data to the background level obtained from a single pre-stimulus sample immediately preceding playback. The percentage change from the background level was calculated based on a formula, and the underlying concentrations were not reported. The authors should report the sample and background concentrations to make the results and analyses more transparent. The authors stated that NE and 5-HT had low recovery from the mouse brain and hence could not be tracked in the experiment. The authors could be more specific here by relating the concentrations to ACh, DA, and 5-HIAA included in the analyses.

      Please see our general statement regarding normalization of neurochemical data. We have added supplemental tables that shows concentrations of dopamine, acetylcholine, 5-HIAA. We do not report serotonin or noradrenalin since these were below the detection threshold.

      2) For the EXP group, the authors stated that each animal underwent 90-min sessions on two consecutive days that provided mating and restraint experiences. Did the authors record mating or copulation during these experiments? If yes, what was the frequency of copulation? What other behaviors were recorded during these experiences? Did the experiment encompass other courtship behaviors along with mating experiences? Was the female mouse in estrus during the experience sessions?

      In the mating experience, mounting or attempted mounting was required for the animal to be included in subsequent testing. Since the session lasted 90 minutes, more general courtship behavior was likely. However, we did not record detailed behaviors or track estrous stage for the mating experience. See p. 21, line 20-22.

      3) For the mating playback, the authors stated that the mating stimulus blocks contained five exemplars of vocal sequences emitted during mating interactions. The authors should clarify whether the vocal sequences were emitted while animals were mating/copulating or when the male and female mice were inside the test box. If the latter was the case, it might be better to call the playback "courtship playback" instead of "mating playback".

      We have modified the Results (p. 5, lines 18-20) and Materials and Methods (p. 21, lines 8-15) to clarify our meaning. We continue to use the term “mating” because this refers to a specific set of behaviors associated with mounting and copulation, rather than the more general term “courtship”. We also indicate that we based these behaviors on previous work (e.g., Gaub et al., 2016).

      4) Since most differences that the authors reported in Figure 3 were observed in Stim 1 and not in Stim 2, it might be better to perform a temporal analysis - looking at behaviors and neurochemicals over time instead of dividing them into two 10-minute bins. The temporal analysis will provide a more accurate representation of changes in behavior and neurochemicals over time.

      Please see our general response to the structuring of experimental periods. The 10-min periods are the minimum for the neurochemical analyses, and we adopted the same periods for behavioral analyses to match the two types of observations. Our repeated measures analysis is a form of temporal analysis, since it compares values in three observation periods.

      5) In Figures 2 and 3, the authors show the correlation between Flinching behavior and ACh concentration. The authors should report correlations between concentrations of all neurochemicals (not just ACh) and all behaviors recorded (not just Flinching), even if they are insignificant. The analyses performed for the stim 1 data should also be performed on the stim 2 data. Reporting these findings would benefit the field.

      Please see general comments regarding correlation analyses. We removed almost all such analyses and references to them from the manuscript based on concerns of the other reviewers.

      6) The mice used in the study were between p90 - p180. The mice were old, and the range of ages was considerable. Are the findings correlated with age? The authors should also discuss how age might affect the experiment's results.

      Our p90-p180 mice are not “old”. CBA/CaJ mice display normal hearing for at least 1 year (Ohlemiller, Dahl, and Gagnon, JARO 11: 605-623, 2010) and adult sexual and social behavior throughout our observation period. They are sexually mature adults, appropriate for this study. We decline to perform correlation analyses with age, both because this was not a question for this study and because the very large number of correlations, for each experimental group (as requested by reviewer #2), render this approach statistically problematic.

      7) The authors reported neurochemical levels estimated as the animals listened to the sounds played back. What about the sustained effects of changes in neurochemicals? Are there any potential long-term effects of social vocalizations on behavior and neurochemical levels? The authors might consider discussing long-term effects.

      We have not included discussion of long term effects of neuromodulatory release, both because our data analysis doesn’t address it (see response to Comment #10) and because we desired to keep the Discussion focused on topics more closely related to the results.

      8) Histology from a single recording was shown in supplementary figure 1. It would benefit the readers if additional histology was shown for all the animals, not just the colored schematics summarizing the recording probe locations. Further explanation of the track location is also needed to help the readers. Make it clear for the readers which dextran-fluorescein labeling image is associated with which track in the schematic.

      Based on the recent publications cited in our overall response to reviewer comments about statistical methods, our reporting of histological location of microdialysis exceeds the standard. We believe that the inclusion of all histology is unnecessary and not particularly helpful. Raw photomicrographs do not always illustrate boundaries, so interpretation is required. However, we added a second photomicrograph example and we identified which tracks correspond to these photomicrographs (see Figure 2; now in main body of manuscript).

      9) The authors did not control for the sounds being played back with a speaker. This control may be necessary since the effects are more pronounced in Stim 1 than in Stim 2. Playing white noise rather than restraint or courtship vocalizations would be an excellent control. However, the authors could perform a permutation analysis and computationally break the relationship between what sound is playing and the neurochemical data. This control would allow the authors to show that the actual neurochemical levels are above or below chance.

      We considered a potential “control” stimulus in our experimental design. We concluded, based on our previous work (e.g., Grimsley et al., 2013; Gadziola et al., 2016), that white noise is not or not necessarily a neutral stimulus and therefore the results would not clarify the responses to the two vocal stimuli. Instead, we opted to use experience as a type of control. This control shows very clearly that temporal patterns and across-group differences in neurochemical response to playback disappear in the absence of experience with the associated behavior.

      10) The authors indicated that each animal's post-vocalization session was also recorded. No data in the manuscript related to the post-vocalization playback period was included. This omission was a missed opportunity to show that the neurochemical levels returned to baseline, and the results were not dependent on the normalization process described in major concern #1. The data should be included in the manuscript and analyzed. It would add further support for the model described in Figure 6.

      We decided not to include analyses of the post-stimulus period because this period is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback. We agree that the general question is of interest to the field, but we don’t think our study is best designed to answer that question.

      11) The authors could use a predictive model, such as a binary classifier trained on the CSF sampling data, to predict the type of vocalizations played back. The predictive model could support the conclusions and provide additional support for the model in Figure 6.

      We recognize that a binary classifier could provide an interesting approach to support conclusions. However, we do not believe that the sample size per group is sufficient to both create and test the classifier.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • Introduction: It would be useful to set up an experimental framework before delving into the results. What are the predictions about specific neuromodulators based on previous literature?

      Because this narrative is laid out in the first two paragraphs of the Results, which immediately follow the Introduction, we believe that additional text in the Introduction on the experimental framework is redundant. As stated above, detailing predictions for a range of neuromodulators would make for a long and not particularly illuminating Introduction. We instead have related our findings to more general understanding of DA and ACh in the Discussion.

      • There really isn't a major difference in stimuli during the "Stim 1" and "Stim 2" phases, and it's not clear why the authors divided the experimental period into two phases. Therefore, the authors need to justify their experimental approach. For example, the authors could first anecdotally mention that behavioral responses to playbacks seem to be larger in the first half of the playbacks than during the second half, therefore they individually analyzed each half of the experimental period. Or adopt a different approach to justify their design. Overall, the analytical approach is reasonable but it is currently not justified.

      See general comment for analysis periods. As noted, we clarified these issues in several locations with Materials and Methods (pp. 24, lines 20-22; p. 26, lines 17-19). We also sought to clarify the meaning of the periods “Stim 1” and “Stim 2”; they are two data collection periods, using the same examplar sequences in the same order. We have added statements in the Material and Methods (p. 18, lines 4-7; Fig. caption, p. 39, lines 11-13).

      • The normalization of neurochemical data seems problematic and unnecessary. By normalizing all data to the baseline data (p. 24), one artificially creates a baseline period with minimal variation (all are "0"; Figures 2, 3 & 5) and this has implications for statistical power. Because the analysis is a within-subjects analysis, this normalization is not necessary for the analysis itself. It can be useful to normalize data for visualization purposes, but raw data should be analyzed. Indeed, behavioral data are qualitatively similar to the neurochemical data, and those data are not normalized to baseline values.

      Please see our general comment on this issue. We believe normalization does not affect statistical power and is both the standard way and an appropriate way to analyze microdialysis results. We include concentrations of ACh, DA, and 5-HIAA in supplementary tables?

      • The authors should include a discussion (in the Discussion section) of how behavior and neurochemical release are associated during the first half of the experimental session but not in the second half (e.g., differences in Ach and DA release between mating and restraint groups during stim 1 and 2, but behavioral differences only during stim 1).

      We have included a section in the Discussion concerning the temporal relationship between behavioral responses and neurochemical changes in response to vocal playback. We note that the linkage is particularly strong in some cases (e.g., ACh release and flinching). This points to a need to examine these phenomena with finer temporal resolution, but also with the recognition that the brain circuits driving a behavioral response may extend beyond the BLA.

      Minor comments:

      • Keywords: add "serotonin" (even though there are no significant differences on 5-HIAA, people interested in serotonin would find this interesting).

      Added to keywords list.

      • Do the authors collect data on the vocalizations of mice in response to these playbacks?

      We monitored vocalizations during playback, noting that vocalizations–especially “Noisy” vocalization–were common. However, we did not record vocalizations and are therefore unable quantify our observations.

      • First line of page 7: readers do not know about "stim 1" and "stim 2". Therefore, the authors need to describe their approach to analyzing behavior and neurochemical release.

      We first introduce these terms earlier, citing Figure 1D,E. We have added some additional wording for further clarification. page 7, lines 4-5.

      • Make sure citations are uniformly formatted (e.g., Inconsistencies in: "As male and female mice emit different vocalizations during mating (Finton et al., 2017; J. M. S. Grimsley et al., 2013; Neunuebel et al., 2015; Sales (née Sewell), 1972)").

      We have reviewed and corrected citations throughout the manuscript.

      • Last paragraph of page 7: "attending behavior" has not been defined yet.

      Table 1 contains our description of the behaviors analyzed in this study. We have now inserted a reference to Table 1 earlier in the Results (p. 6, line 12).

      • Figure 2E and 3G: I find these correlations to be redundant with the GLMs. This is because the significant relationship is likely to be driven by group differences in behavior and in neurochemical release.

      Please see general comments regarding correlation analyses. We removed such analyses and references to them from the manuscript.

      • Page 2, 2nd paragraph, 2nd sentence: this paragraph seems to be rooted in comparing and contrasting experienced and inexperienced mice, so there should be explicit comparisons in each sentence. For example, the 2nd sentence should read: "Whereas EXP estrus females demonstrated increased flinching behaviors in response to mating vocalizations, INEXP ....". This paragraph overall could use some refining.

      We believe this refers to page 9. We have revised the paragraph to clarify our findings (Beginning p. 9, line 23).

      • Page 9: "Further, there were no significant differences across groups during Stim 1 or Stim 2 periods. These results contrast sharply with those from all EXP groups, in which both ACh and DA release changed significantly during playback (Figs. 2C, 2D, 3E, 3F)." While I understand their perspective, this is misleading because changes were only observed during the Stim 1 period.

      We have slightly revised the wording in this paragraph, because the restraint males did not show significant ACh decreases. However, we do not believe our statements mislead readers just because some changes are observed in only one of the stimulation periods (p 10, lines 13-16).

      • Last paragraph of page 14: it would be useful to mention the increase in flinching in experienced females in response to mating vocalizations.

      We have added a sentence in this paragraph relating flinching in estrus females to increased ACh (p. 15, lines 18-20).

      • Was there a full analysis of locomotion in response to playbacks? I see that locomotion was correlated with neurochemical release but was it different in response to different stimuli? Were there changes to the part of the arena that mice occupied in response to restraint vs. mating vocalizations? Given their methods section, it would be useful for the authors to mention the results of the analyses of these aspects of movement.

      We have provided additional descriptions of space use and video tracking data in Material and Methods (p. 23, lines 1-6). We now report additional results associated with these analyses (p. 8, lines 13-15; p. 9, lines 8-14).

      • I believe that each experimental mouse only heard one of the stimuli (given the analytical approach). Because it is plausible to measure neurochemical release in response to both types of stimuli, I encourage the authors to be more explicit about this aspect of the experimental design (e.g., mention in Results section).

      Sentence modified to read: “Each mouse received playback of either the mating or restraint stimuli, but not both: same-day presentation of both stimuli would require excessively long playback sessions, the condition of the same probe would likely change on subsequent days, and quality of a second implanted probe on a subsequent day was uncertain.” (p. 7, lines 5-9).

      • Figure 1A and 1B: add labels to the panels so readers don't have to read the legend to know what spectrogram is associated with what context.

      We added these labels to Figure 1.

      • Table 1: in the definition of "still and alert", should this mention "abrupt attending" instead of "abrupt freezing"? The latter isn't described.

      Yes, we intended “abrupt attending”, and now indicated that in Table 1

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      • The authors report they performed manual behavioral analysis, and provide a table defining the different behaviors. However, it remains unclear how some of these behaviors were detected (such as still-and-alert events). A thorough description of the criteria used to define these events needs to be provided.

      We have modified some descriptions of manually analyzed behaviors in Table 1, and have added additional description of how we developed this set of behaviors for analysis in the study (pp. 22-23).

      • The box plots do not appear to represent the "minimum, first quartile, median, third quartile, and maximum values." as specified on page 24 (Methods). Indeed, the individual data points sometimes do not reach the max or min of the bar plot, and sometimes are way beyond them.

      We used the “inclusive median” function in Excel to generate final boxplots. These boxplots will sometimes result in a data point being placed outside of the whiskers. SPSS considers these to be “outliers”, but our GLM analysis includes these values. We describe this in Data Analysis section of Materials and Methods (p. 28, lines 3-9)

      • Some of the data are replicated in different Figures: Figure 2A and Figure 3C. While this is acceptable, the authors did not correct for multiple comparisons (dividing the p value by the number of comparisons).

      Our analysis included corrections for multiple comparisons, as we have indicated on p. 27, lines 15-16.

      • Overall, the sample sizes are too small (for example in Figure 3, non-estrus females are at n=3), and are different in experiments where they should be equal (Figure 2B: mating stim 1 is at n=5 and mating stim 2 is at n=3).

      We apologize that sample sizes were not properly displayed in figures. Please note that sample sizes are identified in the figure captions. For neuromodulator data, all sample sizes are at least 7. For behavioral data, the minimum sample size is 5. We have revised Figures 3-6 to ensure that all data points are visible.

      • It remains unclear why the impact of mating vocalizations has been tested only in males.

      We assume the reviewer meant that only males were tested in restraint. We now indicate that our preliminary evidence indicated no difference in behavioral responses to restraint vocalization between males and females, so we opted to perform the neurochemical analysis for restraint only in males (page 22 lines 4-5). If there were no limitations to time and cost, we would have preferred to test responses to restraint in females as well. We note that such inclusion would have added up to 4 experimental groups (estrus and non-estrus groups in both EXP and INEXP groups).

      • The correlation between the number of flinching and ACh release changes (Figure 2E) visually appears to be opposite between mating and restraint playbacks. The authors should perform independent correlations for these 2 playbacks.

      Please see general comments regarding correlation analyses. We removed such analyses and references to them from the manuscript.

      • The authors state that their findings "indicate that behavioral responses to salient vocalizations result from interactions between sex of the listener or context of vocal stimuli with the previous behavioral experience associated with these vocalizations.". However, in male mice, they do not report any difference in previous experience on flinching for both restraint and mating sounds, as well as no difference in rearing for the restrain sounds (Figure 4A-B). Thus, the discussion of these results should be completely revisited.

      We revised the paragraph in question (p. 9, line 22 through p. 10, line 9). For instance, we note that significant differences between EXP male-mating and male-restraint flinching do not exist between the INEXP groups. We believe that the last sentence correctly summarizes findings described in this paragraph.

      • For serotonin experiments in Figure S2 there are strong outliers (150% increase in 5HIAA release). Did the authors correlate these levels with the behavior of the animals?

      Outliers are identified by the Excel function that generated the boxplots, but we have no reason to consider these as outliers and exclude them. As noted above, we have clarified that these “outliers” are the result of the Excel function in the Materials and Methods (p. 28, lines 3-9) and we have revised the plotting of data points

      Minor comments:

      • Mating vocalization playback is mainly emitted by males, thus, instead of a positive valence signal, this could also be interpreted as a competitive signal to other males.

      There is support in the literature for viewing our mating stimulus as having positive valence. Gaub et al., 2016 describe the emission of stepped calls, lower frequency harmonics, and increased sound level as indicators of “positive emotion”. We have shown (Grimsley et al, 2013) that the female LFH vocalization can be highly attractive to male mice, under the right conditions, indicating something like “sex is happening”. The inclusion of both the male and female vocalizations in our stimuli was a key piece of our experimental design, based on our understanding of the contributions of both vocalizations to the meaning of the overall acoustic experience.

      • Figure 1 should include panel titles.

      No change. This information is available in the Figure caption.

      • n=31 should be indicated in the EXP group.

      We’re not sure where the reviewer is referring to this value.

      • The color legend of Figure 1E is absent, making the Figure not understandable.

      We added text in the Figure 1 caption to indicate that each color represents a different exemplar. We don’t think a legend provides additional useful information.

      • The point of making two blocks (stim 1 and stim2) should be stated more clearly.

      Please see general statement regarding experimental blocks. We have modified our description of these in an Experimental overview section in the Material and Methods.

      • Including raw data of micro-dialysis in the supplementary figures would allow assessment of the variability and quality of the measurements.

      We have added concentrations of neurochemicals in supplemental tables 1-3.

      • Baseline (prestimulus) number of flinch and rearing should systematically be indicated (missing in Figure 4).

      The focus in this figure is on the differences that occur in Stim 1 values. There are no differences between EXP and INEXP animals of any group during the Pre-Stim period. We now state that in the Figure 4 caption.

      • Discussion: "increase in AMPA/NMDA currents". We believe the authors are referring to the ratio of AMPA to NMDA currents. This sentence should be reformulated.

      These are modified to refer to “… the AMPA/NMDA current ratio…” in two locations in the Discussion (p. 14, lines 8-9; p. 15, line 4)

      • Overall the discussion is very speculative and should rely more on the data.

      We believe that the Discussion provides appropriate speculation that is based on our experimental data and previous literature. We have added a paragraph to identify limitations of our findings and recommendations of future experiments to resolve some issues (p. 12, lines 3-17)

      Reviewer #3 (Recommendations For The Authors):

      Minor concerns:

      1) The authors stated that USVs are most likely to be emitted by males, and LFH are likely to be emitted by females. However, Oliveira-Stahl et al. 2023, Matsumoto et al. 2022, Warren et al. 2018, Heckman et al. 2017, Neunuebel et al., 2015 showed that females also emit USVs. The authors should mention that USVs are emitted by both males and females and discuss how the sex of the vocalizing animal (both males and females) can influence neuromodulator release.

      The reviewer slightly mis-stated the wording of our text, changing the meaning significantly. Our wording is “These sequences included ultrasonic vocalizations (USVs) with harmonics, steps, and complex structure, mostly emitted by males, and low frequency harmonic calls (LFHs) emitted by females (Fig. 1A,C)…” This phrasing is correct and carefully chosen. The Discussion in Oliveira-Stahl et al 2023 (p. 10-11) supports our statement: “The exact fraction of USVs emitted by females as concluded in all previous studies on dyadic courtship has varied, ranging from 18%, 17.5%, and 16% to 10.5% in the present study…”.

      2) The authors should explain why ECF from BLA was collected unilaterally from the left hemisphere.

      p. 23, lines 9-11: We inserted a sentence to explain why we targeted the BLA unilaterally. “Since both left and right amygdala are responsive to vocal stimuli in human and experimental animal studies (Wenstrup et al., 2020), we implanted microdialysis probes into the left amygdala to maintain consistency with other studies in our laboratory..” Beyond that, the choice was arbitrary.

      3) The authors said each animal recovered in its home cage for four days before the playback experiment. A 4-day period may not be sufficient for every animal to recover from surgery, so the authors should describe how a mouse's recovery was assessed.

      p. 23, lines 20-23: We provide more description about the recovery and how it was assessed. Except for a few animals that were not included in the experiments, all animals recovered within 4 days.

      4) The authors stated that each animal was exposed to 90-min sessions with mating and restraint behaviors in a counterbalanced design. This description for Figure 1D should also include the duration of the mating and restraint experience.

      The Results that immediately precede citation to this figure include this information.

      5) The authors stated, "Data are reported only from mice with more than 75% of the microdialysis probe implanted within the BLA". What are the implications of having 25% of the probe outside the BLA? The authors should shed more light on this by discussing this issue as it relates to the findings and commenting on where the other 25% of the probe was located.

      We inserted a sentence to explain the rationale for this inclusion criterion. “We verified placement of microdialysis probes to minimize variability that could arise because regions surrounding BLA receive neurochemical inputs from different sources (e.g., cholinergic inputs to putamen and central amygdala).” (p. 25, lines 21-23).

      All brain regions that surround BLA, dorsal, medial, ventral, or lateral, could have been sampled by the “other” 25%. Some of these, e.g., the central amygdala or caudate-putamen, have different sources of cholinergic input that may not have the same release pattern. We do not think it is worthy of further speculation in the Discussion. Due to the high cost of the neurochemical analysis, we often did not process the neurochemistry data if histology indicated that a probe missed the BLA target.

      6) The authors confirmed that the estrus stage did not change during the experiment day by evaluating and comparing estrus prior to and after data collection. This strategy was a fantastic experimental approach, but the authors should have discussed the results. How did the results the authors included change when the females were in estrus before but not after data collection? What percentage of females started in estrus but ended in metestrus? Assuming that some females changed estrus state, were these animals excluded from the analyses?

      All animals were in the same estrus state at the beginning and end of the playback session.

      7). Authors cite Neunuebel et al., 2015 for the sentence "As male and female mice emit different vocalizations during mating". However, Neunuebel et al., 2015 showed vocalizations emitted during chasing--not mating. If mating is a general term for courtship, then this reference is appropriate, but see major concern #3.

      In the Results (p. 8, line 5), we changed the phrasing to “courtship and mating” to include the Neunubel et al study.

      As we indicate in our response to Public Comment #3, we have modified the Results (p. 5, lines 18-20) and Materials and Methods (p. 21, lines 8-15) to clarify our meaning. We continue to use the term “mating” because this refers to a specific set of behaviors associated with mounting and copulation, rather than the more general term “courtship”. We also indicate that we based these behaviors on previous work (e.g., Gaub et al., 2016).

      8) Authors interpret Figure 3F as DA release showed a "consistent" increase during mating playback across all three experimental groups. However, the increase in the estrus female group is inconsistent, as seen in the graph. This verbiage should be reworded to describe the data more accurately.

      p. 8, line 23 “consistent” was deleted.

      9) In all the box plots, multiple data points overlay each other. A more transparent way of showing the data would be adding some jitter to the x value to make each data point visible. The mean (X's) in Figure 3D (pre-stim mating and mating estrus) are difficult to see, as are all the data points in mating non-estrus. Adding all the symbols to the figure legend or a key in the figure instead of the method section would aid the reader and make the plots easier to interpret

      We have revised the boxplots to ensure that all data points are visible.

      10) Some verbiage used in the discussion should be toned down. For example, "intense" experiences and "emotionally charged" vocalizations should be removed.

      We have not changed these terms, which we believe are appropriate to describe these experiences and vocalizations.

      11) The authors include "Emotional Vocalizations" in the title. It would be beneficial if the authors included more detail and references in the introduction to help set up the emotional content of vocalizations. It may benefit a broader readership as typically targeted by eLife.

      We now cite Darwin and some more recent publications that articulate the general understanding that social vocalizations carry emotional content.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents potentially valuable results on glutamine-rich motifs in relation to protein expression and alternative genetic codes. The author's interpretation of the results is so far only supported by incomplete evidence, due to a lack of acknowledgment of alternative explanations, missing controls and statistical analysis and writing unclear to non experts in the field. These shortcomings could be at least partially overcome by additional experiments, thorough rewriting, or both.

      We thank both the Reviewing Editor and Senior Editor for handling this manuscript.

      Based on your suggestions, we have provided controls, performed statistical analysis, and rewrote our manuscript. The revised manuscript is significantly improved and more accessible to non-experts in the field.

      Reviewer #1 (Public Review):

      Summary

      This work contains 3 sections. The first section describes how protein domains with SQ motifs can increase the abundance of a lacZ reporter in yeast. The authors call this phenomenon autonomous protein expression-enhancing activity, and this finding is well supported. The authors show evidence that this increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance, and that this phenomenon is not affected by mutants in translational quality control. It was not completely clear whether the increased protein abundance is due to increased translation or to increased protein stability.

      In section 2, the authors performed mutagenesis of three N-terminal domains to study how protein sequence changes protein stability and enzymatic activity of the fusions. These data are very interesting, but this section needs more interpretation. It is not clear if the effect is due to the number of S/T/Q/N amino acids or due to the number of phosphorylation sites.

      In section 3, the authors undertake an extensive computational analysis of amino acid runs in 27 species. Many aspects of this section are fascinating to an expert reader. They identify regions with poly-X tracks. These data were not normalized correctly: I think that a null expectation for how often poly-X track occur should be built for each species based on the underlying prevalence of amino acids in that species. As a result, I believe that the claim is not well supported by the data.

      Strengths

      This work is about an interesting topic and contains stimulating bioinformatics analysis. The first two sections, where the authors investigate how S/T/Q/N abundance modulates protein expression level, is well supported by the data. The bioinformatics analysis of Q abundance in ciliate proteomes is fascinating. There are some ciliates that have repurposed stop codons to code for Q. The authors find that in these proteomes, Q-runs are greatly expanded. They offer interesting speculations on how this expansion might impact protein function.

      Weakness

      At this time, the manuscript is disorganized and difficult to read. An expert in the field, who will not be distracted by the disorganization, will find some very interesting results included. In particular, the order of the introduction does not match the rest of the paper.

      In the first and second sections, where the authors investigate how S/T/Q/N abundance modulates protein expression levels, it is unclear if the effect is due to the number of phosphorylation sites or the number of S/T/Q/N residues.

      There are three reasons why the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities:

      First, we have reported previously that phosphorylation-defective Rad51-NTD (Rad51-3SA) and wild-type Rad51-NTD exhibit similar autonomous PEE activity. Mec1/Tel1-dependent phosphorylation of Rad51-NTD antagonizes the proteasomal degradation pathway, increasing the half-life of Rad51 from ∼30 min to ≥180 min (1). (page 1, lines 11-14)

      Second, in our preprint manuscript, we have already shown that phosphorylation-defective Rad53-SCD1 (Rad51-SCD1-5STA) also exhibits autonomous PEE activity similar to that of wild-type Rad53-SCD (Figure 2D, Figure 4A and Figure 4C). We have highlighted this point in our revised manuscript (page 9, lines 19-21).

      Third, as revealed by the results of Figure 4, it is the percentages, and not the numbers, of S/T/Q/N residues that are correlated with the PEE activities of Q-rich motifs.

      The authors also do not discuss if the N-end rule for protein stability applies to the lacZ reporter or the fusion proteins.

      The autonomous PEE function of S/T/Q-rich NTDs is unlikely to be relevant to the N-end rule. The N-end rule links the in vivo half-life of a protein to the identity of its N-terminal residues. In S. cerevisiae, the N-end rule operates as part of the ubiquitin system and comprises two pathways. First, the Arg/N-end rule pathway, involving a single N-terminal amidohydrolase Nta1, mediates deamidation of N-terminal asparagine (N) and glutamine (Q) into aspartate (D) and glutamate (E), which in turn are arginylated by a single Ate1 R-transferase, generating the Arg/N degron. N-terminal R and other primary degrons are recognized by a single N-recognin Ubr1 in concert with ubiquitin-conjugating Ubc2/Rad6. Ubr1 can also recognize several other N-terminal residues, including lysine (K), histidine (H), phenylalanine (F), tryptophan (W), leucine (L) and isoleucine (I) (68-70). Second, the Ac/N-end rule pathway targets proteins containing N-terminally acetylated (Ac) residues. Prior to acetylation, the first amino acid methionine (M) is catalytically removed by Met-aminopeptidases (MetAPs), unless a residue at position 2 is non-permissive (too large) for MetAPs. If a retained N-terminal M or otherwise a valine (V), cysteine (C), alanine (A), serine (S) or threonine (T) residue is followed by residues that allow N-terminal acetylation, the proteins containing these AcN degrons are targeted for ubiquitylation and proteasome-mediated degradation by the Doa10 E3 ligase (71).

      The PEE activities of these S/T/Q-rich domains are unlikely to arise from counteracting the N-end rule for two reasons. First, the first two amino acid residues of Rad51-NTD, Hop1-SCD, Rad53-SCD1, Sup35-PND, Rad51-ΔN, and LacZ-NVH are MS, ME, ME, MS, ME, and MI, respectively, where M is methionine, S is serine, E is glutamic acid and I is isoleucine. Second, Sml1-NTD behaves similarly to these N-terminal fusion tags, despite its methionine and glutamine (MQ) amino acid signature at the N-terminus. (Page 12, line 3 to page 13, line 2)

      The most interesting part of the paper is an exploration of S/T/Q/N-rich regions and other repetitive AA runs in 27 proteomes, particularly ciliates. However, this analysis is missing a critical control that makes it nearly impossible to evaluate the importance of the findings. The authors find the abundance of different amino acid runs in various proteomes. They also report the background abundance of each amino acid. They do not use this background abundance to normalize the runs of amino acids to create a null expectation from each proteome. For example, it has been clear for some time (Ruff, 2017; Ruff et al., 2016) that Drosophila contains a very high background of Q's in the proteome and it is necessary to control for this background abundance when finding runs of Q's.

      We apologize for not explaining sufficiently well the topic eliciting this reviewer’s concern in our preprint manuscript. In the second paragraph of page 14, we cite six references to highlight that SCDs are overrepresented in yeast and human proteins involved in several biological processes (5, 43) and that polyX prevalence differs among species (79-82).

      We will cite a reference by Kiersten M. Ruff in our revised manuscript (38).

      K. M. Ruff, J. B. Warner, A. Posey and P. S. Tan (2017) Polyglutamine length dependent structural properties and phase behavior of huntingtin exon1. Biophysical Journal 112, 511a.

      The authors could easily address this problem with the data and analysis they have already collected. However, at this time, without this normalization, I am hesitant to trust the lists of proteins with long runs of amino acid and the ensuing GO enrichment analysis. Ruff KM. 2017. Washington University in St.

      Ruff KM, Holehouse AS, Richardson MGO, Pappu RV. 2016. Proteomic and Biophysical Analysis of Polar Tracts. Biophys J 110:556a.

      We thank Reviewer #1 for this helpful suggestion and now address this issue by means of a different approach described below.

      Based on a previous study (43), we applied seven different thresholds to seek both short and long, as well as pure and impure, polyX strings in 20 different representative near-complete proteomes, including 4X (4/4), 5X (4/5-5/5), 6X (4/6-6/6), 7X (4/7-7/7), 8-10X (≥50%X), 11-10X (≥50%X) and ≥21X (≥50%X).

      To normalize the runs of amino acids and create a null expectation from each proteome, we determined the ratios of the overall number of X residues for each of the seven polyX motifs relative to those in the entire proteome of each species, respectively. The results of four different polyX motifs are shown in our revised manuscript, i.e., polyQ (Figure 7), polyN (Figure 8), polyS (Figure 9) and polyT (Figure 10). Thus, polyX prevalence differs among species and the overall X contents of polyX motifs often but not always correlate with the X usage frequency in entire proteomes (43).

      Most importantly, our results reveal that, compared to Stentor coeruleus or several non-ciliate eukaryotic organisms (e.g., Plasmodium falciparum, Caenorhabditis elegans, Danio rerio, Mus musculus and Homo sapiens), the five ciliates with reassigned TAAQ and TAGQ codons not only have higher Q usage frequencies, but also more polyQ motifs in their proteomes (Figure 7). In contrast, polyQ motifs prevail in Candida albicans, Candida tropicalis, Dictyostelium discoideum, Chlamydomonas reinhardtii, Drosophila melanogaster and Aedes aegypti, though the Q usage frequencies in their entire proteomes are not significantly higher than those of other eukaryotes (Figure 1). Due to their higher N usage frequencies, Dictyostelium discoideum, Plasmodium falciparum and Pseudocohnilembus persalinus have more polyN motifs than the other 23 eukaryotes we examined here (Figure 8). Generally speaking, all 26 eukaryotes we assessed have similar S usage frequencies and percentages of S contents in polyS motifs (Figure 9). Among these 26 eukaryotes, Dictyostelium discoideum possesses many more polyT motifs, though its T usage frequency is similar to that of the other 25 eukaryotes (Figure 10).

      In conclusion, these new normalized results confirm that the reassignment of stop codons to Q indeed results in both higher Q usage frequencies and more polyQ motifs in ciliates.  

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to understand the connection between protein sequence and function in disordered regions enriched in polar amino acids (specifically Q, N, S and T). While the authors suggest that specific motifs facilitate protein-enhancing activities, their findings are correlative, and the evidence is incomplete. Similarly, the authors propose that the re-assignment of stop codons to glutamine-encoding codons underlies the greater user of glutamine in a subset of ciliates, but again, the conclusions here are, at best, correlative. The authors perform extensive bioinformatic analysis, with detailed (albeit somewhat ad hoc) discussion on a number of proteins. Overall, the results presented here are interesting, but are unable to exclude competing hypotheses.

      Strengths:

      Following up on previous work, the authors wish to uncover a mechanism associated with poly-Q and SCD motifs explaining proposed protein expression-enhancing activities. They note that these motifs often occur IDRs and hypothesize that structural plasticity could be capitalized upon as a mechanism of diversification in evolution. To investigate this further, they employ bioinformatics to investigate the sequence features of proteomes of 27 eukaryotes. They deepen their sequence space exploration uncovering sub-phylum-specific features associated with species in which a stop-codon substitution has occurred. The authors propose this stop-codon substitution underlies an expansion of ploy-Q repeats and increased glutamine distribution.

      Weaknesses:

      The preprint provides extensive, detailed, and entirely unnecessary background information throughout, hampering reading and making it difficult to understand the ideas being proposed.

      The introduction provides a large amount of detailed background that appears entirely irrelevant for the paper. Many places detailed discussions on specific proteins that are likely of interest to the authors occur, yet without context, this does not enhance the paper for the reader.

      The paper uses many unnecessary, new, or redefined acronyms which makes reading difficult. As examples:

      1) Prion forming domains (PFDs). Do the authors mean prion-like domains (PLDs), an established term with an empirical definition from the PLAAC algorithm? If yes, they should say this. If not, they must define what a prion-forming domain is formally.

      The N-terminal domain (1-123 amino acids) of S. cerevisiae Sup35 was already referred to as a “prion forming domain (PFD)” in 2006 (48). Since then, PFD has also been employed as an acronym in other yeast prion papers (Cox, B.S. et al. 2007; Toombs, T. et al. 2011).

      B. S. Cox, L. Byrne, M. F., Tuite, Protein Stability. Prion 1, 170-178 (2007). J. A. Toombs, N. M. Liss, K. R. Cobble, Z. Ben-Musa, E. D. Ross, [PSI+] maintenance is dependent on the composition, not primary sequence, of the oligopeptide repeat domain. PLoS One 6, e21953 (2011).

      2) SCD is already an acronym in the IDP field (meaning sequence charge decoration) - the authors should avoid this as their chosen acronym for Serine(S) / threonine (T)-glutamine (Q) cluster domains. Moreover, do we really need another acronym here (we do not).

      SCD was first used in 2005 as an acronym for the Serine (S)/threonine (T)-glutamine (Q) cluster domain in the DNA damage checkpoint field (4). Almost a decade later, SCD became an acronym for “sequence charge decoration” (Sawle, L. et al. 2015; Firman, T. et al. 2018).

      L. Sawle and K, Ghosh, A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem Phys. 143, 085101(2015).

      T. Firman and Ghosh, K. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins. J. Chem Phys. 148, 123305 (2018).

      3) Protein expression-enhancing (PEE) - just say expression-enhancing, there is no need for an acronym here.

      Thank you. Since we have shown that the addition of Q-rich motifs to LacZ affects protein expression rather than transcription, we think it is better to use the “PEE” acronym.

      The results suggest autonomous protein expression-enhancing activities of regions of multiple proteins containing Q-rich and SCD motifs. Their definition of expression-enhancing activities is vague and the evidence they provide to support the claim is weak. While their previous work may support their claim with more evidence, it should be explained in more detail. The assay they choose is a fusion reporter measuring beta-galactosidase activity and tracking expression levels. Given the presented data they have shown that they can drive the expression of their reporters and that beta gal remains active, in addition to the increase in expression of fusion reporter during the stress response. They have not detailed what their control and mock treatment is, which makes complete understanding of their experimental approach difficult. Furthermore, their nuclear localization signal on the tag could be influencing the degradation kinetics or sequestering the reporter, leading to its accumulation and the appearance of enhanced expression. Their evidence refuting ubiquitin-mediated degradation does not have a convincing control.

      Although this reviewer’s concern regarding our use of a nuclear localization signal on the tag is understandable, we are confident that this signal does not bias our findings for two reasons. First, the negative control LacZ-NV also possesses the same nuclear localization signal (Figure 1A, lane 2). Second, another fusion target, Rad51-ΔN, does not harbor the NVH tag (Figure 1D, lanes 3-4). Compared to wild-type Rad51, Rad51-ΔN is highly labile. In our previous study, removal of the NTD from Rad51 reduced by ~97% the protein levels of corresponding Rad51-ΔN proteins relative to wild-type (1).

      Based on the experimental results, the authors then go on to perform bioinformatic analysis of SCD proteins and polyX proteins. Unfortunately, there is no clear hypothesis for what is being tested; there is a vague sense of investigating polyX/SCD regions, but I did not find the connection between the first and section compelling (especially given polar-rich regions have been shown to engage in many different functions). As such, this bioinformatic analysis largely presents as many lists of percentages without any meaningful interpretation. The bioinformatics analysis lacks any kind of rigorous statistical tests, making it difficult to evaluate the conclusions drawn. The methods section is severely lacking. Specifically, many of the methods require the reader to read many other papers. While referencing prior work is of course, important, the authors should ensure the methods in this paper provide the details needed to allow a reader to evaluate the work being presented. As it stands, this is not the case.

      Thank you. As described in detail below, we have now performed rigorous statistical testing using the GofuncR package (Figure 11, Figure 12 and DS7-DS32).

      Overall, my major concern with this work is that the authors make two central claims in this paper (as per the Discussion). The authors claim that Q-rich motifs enhance protein expression. The implication here is that Q-rich motif IDRs are special, but this is not tested. As such, they cannot exclude the competing hypothesis ("N-terminal disordered regions enhance expression").

      In fact, “N-terminal disordered regions enhance expression” exactly summarizes our hypothesis.

      On pages 12-13 and Figure 4 of our preprint manuscript, we explained our hypothesis in the paragraph entitled “The relationship between PEE function, amino acid contents, and structural flexibility”.

      The authors also do not explore the possibility that this effect is in part/entirely driven by mRNA-level effects (see Verma Na Comms 2019).

      As pointed out by the first reviewer, we present evidence that the increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance (Figure 2), and that this phenomenon is not affected in translational quality control mutants (Figure 3).

      As such, while these observations are interesting, they feel preliminary and, in my opinion, cannot be used to draw hard conclusions on how N-terminal IDR sequence features influence protein expression. This does not mean the authors are necessarily wrong, but from the data presented here, I do not believe strong conclusions can be drawn. That re-assignment of stop codons to Q increases proteome-wide Q usage. I was unable to understand what result led the authors to this conclusion.

      My reading of the results is that a subset of ciliates has re-assigned UAA and UAG from the stop codon to Q. Those ciliates have more polyQ-containing proteins. However, they also have more polyN-containing proteins and proteins enriched in S/T-Q clusters. Surely if this were a stop-codon-dependent effect, we'd ONLY see an enhancement in Q-richness, not a corresponding enhancement in all polar-rich IDR frequencies? It seems the better working hypothesis is that free-floating climate proteomes are enriched in polar amino acids compared to sessile ciliates.

      We thank this reviewer for raising this point, however her/his comments are not supported by the results in Figure 7.

      Regardless, the absence of any kind of statistical analysis makes it hard to draw strong conclusions here.

      We apologize for not explaining more clearly the results of Tables 5-7 in our preprint manuscript.

      To address the concerns about our GO enrichment analysis by both reviewers, we have now performed rigorous statistical testing for SCD and polyQ protein overrepresentation using the GOfuncR package (https://bioconductor.org/packages/release/bioc/html/GOfuncR.html). GOfuncR is an R package program that conducts standard candidate vs. background enrichment analysis by means of the hypergeometric test. We then adjusted the raw p-values according to the Family-wise error rate (FWER). The same method had been applied to GO enrichment analysis of human genomes (89).

      The results presented in Figure 11 and Figure 12 (DS7-DS32) support our hypothesis that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition, Candida albicans filamentous growth, peptidyl-glutamic acid modification in ciliates with reassigned stop codons (TAAQ and TAGQ), Tetrahymena thermophila xylan catabolism, Dictyostelium discoideum sexual reproduction, Plasmodium falciparum infection, as well as the nervous systems of Drosophila melanogaster, Mus musculus, and Homo sapiens (78). In contrast, peptidyl-glutamic acid modification and microtubule-based movement are not overrepresented with Q-rich proteins in Stentor coeruleus, a ciliate with standard stop codons.

      Recommendations for the authors:

      Please note that you control which revisions to undertake from the public reviews and recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors):

      The order of paragraphs in the introduction was very difficult to follow. Each paragraph was clear and easy to understand, but the order of paragraphs did not make sense to this reader. The order of events in the abstract matches the order of events in the results section. However, the order of paragraphs in the introduction is completely different and this was very confusing. This disordered list of facts might make sense to an expert reader but makes it hard for a non-expert reader to understand.

      Apologies. We endeavored to improve the flow of our revised manuscript to make it more readable.

      The section beginning on pg 12 focused on figures 4 and 5 was very interesting and highly promising. However, it was initially hard for me to tell from the main text what the experiment was. Please add to the text an explanation of the experiment, because it is hard to figure out what was going on from the figures alone. Figure 4 is fantastic, but would be improved by adding error bars and scaling the x-axis to be the same in panels B,C,D.

      Thank you for this recommendation. We have now scaled both the x-axis and y-axis equivalently in panels B, C and D of Figure 4. Error bars are too small to be included.

      It is hard to tell if the key variable is the number of S/T/Q/N residues or the number of phosphosites. I think a good control would be to add a regression against the number of putative phosphosites. The sequences are well designed. I loved this part but as a reader, I need more interpretation about why it matters and how it explains the PEE.

      As described above, we have shown that the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities.

      I believe that the prevalence of polyX runs is not meaningful without normalizing for the background abundance of each amino acid. The proteome-wide abundance and the assumption that amino acids occur independently can be used to form a baseline expectation for which runs are longer than expected by chance. I think Figures 6 and 7 should go into the supplement and be replaced in the main text with a figure where Figure 6 is normalized by Figure 7. For example in P. falciparum, there are many N-runs (Figure 6), but the proteome has the highest fraction of N’s (Figure 7).

      Thank you for these suggestions. The three figures in our preprint manuscript (Figures 6-8) have been moved into the supplementary information (Figures S1-S3). For normalization, we have provided four new figures (Figures 7-10) in our revised manuscript.

      The analysis of ciliate proteomes was fascinating. I am particularly interested in the GO enrichment for “peptidyl-glutamic acid modification” (pg 20) because these enzymes might be modifying some of Q’s in the Q-runs. I might be wrong about this idea or confused about the chemistry. Do these ciliates live in Q-rich environments? Or nitrogen rich environments?

      Polymeric modifications (polymodifications) are a hallmark of C-terminal tubulin tails, whereas secondary peptide chains of glutamic acids (polyglutamylation) and glycines (polyglycylation) are catalyzed from the γ-carboxyl group of primary chain glutamic acids. It is not clear if these enzymes can modify some of the Q’s in the Q-runs.

      To our knowledge, ciliates are abundant in almost every liquid water environment, i.e., oceans/seas, marine sediments, lakes, ponds, and rivers, and even soils.

      I think you should include more discussion about how the codons that code for Q’s are prone to slippage during DNA replication, and thus many Q-runs are unstable and expand (e.g. Huntington’s Disease). The end of pg 24 or pg 25 would be good places.

      We thank the reviewer for these comments.

      PolyQ motifs have a particular length-dependent codon usage that relates to strand slippage in CAG/CTG trinucleotide repeat regions during DNA replication. In most organisms having standard genetic codons, Q is encoded by CAGQ and CAAQ. Here, we have determined and compared proteome-wide Q contents, as well as the CAGQ usage frequencies (i.e., the ratio between CAGQ and the sum of CAGQ, CAGQ, TAAQ, and TAGQ).

      Our results reveal that the likelihood of forming long CAG/CTG trinucleotide repeats are higher in five eukaryotes due to their higher CAGQ usage frequencies, including Drosophila melanogaster (86.6% Q), Danio rerio (74.0% Q), Mus musculus (74.0% Q), Homo sapiens (73.5% Q), and Chlamydomonas reinhardtii (87.3% Q) (orange background, Table 2). In contrast, another five eukaryotes that possess high numbers of polyQ motifs (i.e., Dictyostelium discoideum, Candida albicans, Candida tropicalis, Plasmodium falciparum and Stentor coeruleus) (Figure 1) utilize more CAAQ (96.2%, 84.6%, 84.5%, 86.7% and 75.7%) than CAAQ (3.8%, 15.4%, 15.5%, 13.3% and 24.3%), respectively, to avoid the formation of long CAG/CTG trinucleotide repeats (green background, Table 2). Similarly, all five ciliates with reassigned stop codons (TAAQ and TAGQ) have low CAGQ usage frequencies (i.e., from 3.8% Q in Pseudocohnilembus persalinus to 12.6% Q in Oxytricha trifallax) (red font, Table 2). Accordingly, the CAG-slippage mechanism might operate more frequently in Chlamydomonas reinhardtii, Drosophila melanogaster, Danio rerio, Mus musculus and Homo sapiens than in Dictyostelium discoideum, Candida albicans, Candida tropicalis, Plasmodium falciparum, Stentor coeruleus and the five ciliates with reassigned stop codons (TAAQ and TAGQ).

      Author response table 1.

      Usage frequencies of TAA, TAG, TAAQ, TAGQ, CAAQ and CAGQ codons in the entire proteomes of 20 different organisms.

      Pg 7, paragraph 2 has no direction. Please add the conclusion of the paragraph to the first sentence.

      This paragraph has been moved to the “Introduction” section” of the revised manuscript.

      Pg 8, I suggest only mentioning the PFDs used in the experiments. The rest are distracting.

      We have addressed this concern above.

      Pg 12. Please revise the "The relationship...." text to explain the experiment.

      We apologize for not explaining this topic sufficiently well in our preprint manuscript.

      SCDs are often structurally flexible sequences (4) or even IDRs. Using IUPred2A (https://iupred2a.elte.hu/plot_new), a web-server for identifying disordered protein regions (88), we found that Rad51-NTD (1-66 a.a.) (1), Rad53-SCD1 (1-29 a.a.) and Sup35-NPD (1-39 a.a.) are highly structurally flexible. Since a high content of serine (S), threonine (T), glutamine (Q), asparanine (N) is a common feature of IDRs (17-20), we applied alanine scanning mutagenesis approach to reduce the percentages of S, T, Q or N in Rad51-NTD, Rad53-SCD1 or Sup35-NPD, respectively. As shown in Figure 4 and Figure 5, there is a very strong positive relationship between STQ and STQN amino acid percentages and β-galactosidase activities. (Page 13, lines 5-10)

      Pg 13, first full paragraph, "Futionally, IDRs..." I think this paragraph belongs in the Discussion.

      This paragraph is now in the “Introduction” section (Page 5, Lines 11-15).

      Pg. 15, I think the order of paragraphs should be swapped.

      These paragraphs have been removed or rewritten in the “Introduction section” of our revised manuscript.

      Pg 17 (and other parts) I found the lists of numbers and percentages hard to read and I think you should refer readers to the tables.

      Thank you. In the revised manuscript, we have avoided using lists of numbers and percentages, unless we feel they are absolutely essential.

      Pg. 19 please add more interpretation to the last paragraph. It is very cool but I need help understanding the result. Are these proteins diverging rapidly? Perhaps this is a place to include the idea of codon slippage during DNA replication.

      Thank you. The new results in Table 2 indicate that the CAG-slippage mechanism is unlikely to operate in ciliates with reassigned stop codons (TAAQ and TAGQ).

      Pg 24. "Based on our findings from this study, we suggest that Q-rich motifs are useful toolkits for generating novel diversity during protein evolution, including by enabling greater protein expression, protein-protein interactions, posttranslational modifications, increased solubility, and tunable stability, among other important traits." This idea needs to be cited. Keith Dunker has written extensively about this idea as have others. Perhaps also discuss why Poly Q rich regions are different from other IDRs and different from other IDRs that phase-separate.

      Agreed, we have cited two of Keith Dunker’s papers in our revised manuscript (73, 74).

      Minor notes:

      Please define Borg genomes (pg 25).

      Borgs are long extrachromosomal DNA sequences in methane-oxidizing Methanoperedens archaea, which display the potential to augment methane oxidation (101). They are now described in our revised manuscript. (Page 15, lines 12-14)

      Reviewer #2 (Recommendations For The Authors):

      The authors dance around disorder but never really quantify or show data. This seems like a strange blindspot.

      We apologize for not explaining this topic sufficiently well in our preprint manuscript. We have endeavored to do so in our revised manuscript.

      The authors claim the expression enhancement is "autonomous," but they have not ruled things out that would make it not autonomous.

      Evidence of the “autonomous” nature of expression enhancement is presented in Figure 1, Figure 4, and Figure 5 of the preprint manuscript.

      Recommendations for improving the writing and presentation.

      The title does not recapitulate the entire body of work. The first 5 figures are not represented by the title in any way, and indeed, I have serious misgivings as to whether the conclusion stated in the title is supported by the work. I would strongly suggest the authors change the title.

      Figure 2 could be supplemental.

      Thank you. We think it is important to keep Figure 2 in the text.

      Figures 4 and 5 are not discussed much or particularly well.

      This reviewer’s opinion of Figure 4 and Figure 5 is in stark contrast to those of the first reviewer.

      The introduction, while very thorough, takes away from the main findings of the paper. It is more suited to a review and not a tailored set of minimal information necessary to set up the question and findings of the paper. The question that the authors are after is also not very clear.

      Thank you. The entire “Introduction” section has been extensively rewritten in the revised manuscript.

      Schematics of their fusion constructs and changes to the sequence would be nice, even if supplemental.

      Schematics of the fusion constructs are provided in Figure 1A.

      The methods section should be substantially expanded.

      The method section in the revised manuscript has been rewritten and expanded. The six Javascript programs used in this work are listed in Table S4.

      The text is not always suited to the general audience and readership of eLife.

      We have now rewritten parts of our manuscript to make it more accessible to the broad readership of eLife.

      In some cases, section headers really don't match what is presented, or there is no evidence to back the claim.

      The section headers in the revised manuscript have been corrected.

      A lot of the listed results in the back half of the paper could be a supplemental table, listing %s in a paragraph (several of them in a row) is never nice

      Acknowledged. In the revised manuscript, we have removed almost all sentences listing %s.

      Minor corrections to the text and figures.

      There is a reference to table 1 multiple times, and it seems that there is a missing table. The current table 1 does not seem to be the same table referred to in some places throughout the text.

      Apologies for this mistake, which we have now corrected in our revised manuscript.

      In some places its not clear where new work is and where previous work is mentioned. It would help if the authors clearly stated "In previous work...."

      Acknowledged. We have corrected this oversight in our revised manuscript.

      Not all strains are listed in the strain table (KO's in figure 3 are not included)

      Apologies, we have now corrected Table S2, as suggested by this reviewer.

      Author response table 2.

      S. cerevisiae strains used in this study

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Recommendations For The Authors):

      While the details are mostly well-explained, I think that the authors could better bring forth the goals and potential usages of hippocampome.org overall.

      I think that this is a great and helpful tool that can leverage various and detailed cellular experimental studies that are out there in the literature to garner potential insights, direct future experimental studies, observe/classify experimental 'differences' (e.g., the deep and superficial pyramidal studies they mention) and so on. Say that one gets some mechanistic insight from more abstract theoretical models, hippocampome can be used to determine whether the experimental data where available is supportive of the theory. They also describe CA3 model and grid cells. While I am not suggesting that the authors completely re-organize the manuscript, I did feel that the last section 'potential applications...' could have perhaps been brought forth earlier (in a summarized form) for the reader/user to better appreciate hippocampome - indeed it is line 288 that should be near the beginning of the paper I thought.

      We thank the Reviewer for the suggestion. We have now included a summary of the simulation readiness of Hippocampome.org in the Introduction.

      I thought the 'application' paragraph (starting line 288) needed expansion to appreciate - I did not have a chance to look at the cited papers in that section - but maybe 2 paragraphs, one on CA3 and the other on grid cells, with a few more sentences of goal/context and tool usage details could be provided?

      We thank the Reviewer for the suggestion. We have added expanded paragraphs describing the simulation work on CA3 and grid cells.

      The authors start their Discussion by mentioning other resources (e.g. blue brain) in comparison. I thought that this was not too helpful without a bit more expansion about these other resources and what in particular is comparable. For example, the blue brain project is different in that it does not mine the literature per se (I think)? But then I am not sure of the extent of the comparison that the authors intend with blue brain and the other mentioned resources.

      Thank you for the helpful suggestion. We have now expanded upon the paragraph to draw more explicit parallels and contrasts among the various projects, in particular between the Blue Brain Project and Hippocampome.org.

      Minor comments

      • Fig 3D caption missing

      Thank you for pointing this out. We have now amended the figure caption.

      • Fig 5A line 211-12 refers to v2.0 but Fig 5 caption says v1.0?

      We apologize for the confusion. We have now added text clarifying the V1.X relevant descriptions around Figure 5.

      • Fig 6A confusing with thin and thick arrows and direction?

      We apologize for the confusion. We have re-colored the thick arrows orange to emphasize the fact that they are feeding directly into the spiking neural simulations.

      • Line 260 - not sure what this means - how is importance defined?

      We apologize for the confusion. We have now added text clarifying that “importance” refers to the role the neuron type plays in the functioning circuitry of the hippocampal formation.

      • CARLsim vs Brian/NEST in choosing - maybe a sentence or two for rationale

      Thank you for the suggestion. We have now added a sentence explaining the selection of CARLsim. CARLsim was selected due to its ability to run on collections of GPUs. CARLsim was the only simulator with this capability at the time the simulation work was being planned, and the power of a GPU supercomputer was needed to simulate the millions of neurons that comprise a full simulation of the complete hippocampal formation.

      • Fig 9 mv should be mV, and the voltage values specified there refer to which dash?

      Thank you for pointing these situations out. We have amended the millivolts label and have made changes to the figure to help clarify which specific tick marks are being labeled.

      Reviewer #3 (Recommendations For The Authors):

      Compliments to the authors on this nicely organized and structured presentation of V 2.0 of hippocampome.org. The paper is well prepared giving a useful short summary of the history of hippocampome for the newcomers and refreshing the memory of users, switching to highlighting the new data additions, why these are relevant and how these complement the existing database, and opening up to new applications. The added potential is well illustrated and in addition, the authors provide numerical information on the usage of this amazing resource. I enjoyed roaming around in the new version, which was made available for reviewers, and although it has been a while since I worked with the system, the new version is easy to work with. I have not had the time to use it extensively so cannot comment in detail but based on the long experience of the authors and their support team, I trust that version 2 will be almost not completely flawless; however that will for sure become clear when it is released.

      One could always wish for more, disagree, or even criticize choices made to cluster neurons, divide areas, and so forth, though in my view that does not contribute to what the resource has to offer. Having said this, the authors might consider addressing briefly issues about differences in the nomenclature used in original descriptions and how they handled the translation into their nomenclature. To mention one that is constantly being debated: how does one define the border between SMo and SMi.

      Thank you for the suggestion. We have added text to the Introduction that addresses the nomenclature issue, as presented in Hamilton et al. (2017), and provide a definition for SMo and SMi.

      Another confusing issue is presented by layers in the entorhinal cortex or its subdivisions (how many and how are these defined). So, some remarks for newcomers in the field who might use the database without spending too much energy to read the original data, might be useful.

      Thank you for the suggestion to clarify this situation pertaining to the entorhinal cortex. Often, we have assumed the authors’ own definitions of the layers and subdivisions (medial and lateral), when naming neuron types. When our name is a hybrid of two published names that include both medial and lateral neurons, our name is prefixed by a simple EC, rather than by MEC or LEC.

      As noted, the authors present version 2 nicely and comprehensibly and I have only a few additional comments, meant to further improve the already high quality of the paper.

      1) The figures, nice as they are, are incredibly information-dense, so they require serious study to get the details; the legends do help, but the many abbreviations coming from totally different fields make it challenging to keep track of them while reading. This is a pity since there is a lot of new information in this version of the dataset, compared to previous versions and the authors overall succeed in emphasizing what is new and why this might be of use/importance.

      So a few suggestions: i) add relevant/most important abbreviations to the legends of the individual figures; ii) introduce all abbreviations upon first use and do not simply refer to the table in the methods. Interestingly, even the authors lose track in the introduction where they use BICCN in line 43 and refer to the abbreviation list, though the full name is given two lines below.

      We apologize for the confusion. We have amended the main text to clarify abbreviations. We have added the abbreviation definitions to the captions of the figures, and in some instances, removed the abbreviations from the figures altogether where space allowed.

      2) Figure 3 and even more so figure 5 depend strongly on the color differences red/green; please change since generally red/green is no longer used for obvious reasons.

      Thank you for pointing this out. We have switched the fonts in Figure 3 to black (excitatory) and gray (inhibitory) to match our previous publication. We have also changed the color schemes in Figure 5 to avoid red and green.

      Reviewer #3 commented on the complexity of our figures and how the figures are information dense. To partially address this, we have decided to remove panel A2 of Figure 3. It was originally meant to emphasize where the information came from to add new axonal projections to two v1.0 neuron types; however, it is not necessary to make the point in the illustration. Thus, we have removed the panel and amended the caption for Figure 3A to include the cited reference.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for The Authors):

      1) While the specificity of the observed muscle phenotypes seems clear, the subsequent molecular analysis of Numb protein interactors does not seem to consider the potential involvement of Numb-like. The authors should demonstrate the relative expression levels of Numb and Numb-like in the models used, and establish the specificity of the antibodies used in IP, western and staining experiments.

      Response: Perhaps the most convincing evidence that the anti-Numb antibody did not pull down Numb-like is that this protein was not detected among immunoprecipitated protein complexes pulled down by the anti-Numb antibody used. The antibody used in the immunoprecipitation was validated by the supplier and was previously reported to immunoprecipitate Numb [1, 2]. We previously demonstrated that a morpholino against Numb mRNA almost completely eliminated the band detected by this antibody and that this band was at the expected molecular weight [ref]. In our hands, mRNA levels for Numb-like in skeletal muscle are 5-10-fold lower than those for Numb [3]. We have been unable to detect Numb-like protein in healthy adult skeletal muscle by immunoblotting or immunofluorescence staining. Taking all of these findings together, it seems unlikely that the antibodies used for immunoprecipitating Numb-protein complexes pulls down Numb-like.

      2) The authors use PCR to investigate Numb isoform expression and conclude that p65 is likely the dominant protein isoform expressed. While this agrees with the single band observed in Supp Figure 4A, a positive control for exon 9 excluded and included isoforms in the PCR reactions would strengthen this conclusion.

      Response: The amplicons shown in Supplemental 4 were sequenced. The clones corresponded to the isoforms with the exon 3 present or removed. No amplicons containing exon 9 were detected. The following sentence was added to the Analysis of Splice Variants section of Methods to address this point: “PCR products were cloned using the TOPO TA cloning system (ThermoFisher) and multiple resulting clones were sequenced to confirm that the expected products were generated.”

      3) PCR analysis of total Numb and Numb-like expression levels are not shown. This is important given the specificity of the Numb antibodies used for AP-MS experiments are not described and some Numb antibodies are well known to also recognize Numb-like. Two different Numb antibodies were used for Western and immunoprecipitation but the specificity for Numb and Numb-like is not described. In particular, does the antibody used in the AP-MS experiment recognize both Numb and Numb-like? Supplementary Table 1 does not list Numb or Numb-like, but presumably peptides were identified?

      Response: As noted above, the specificity of anti-Numb antibodies was confirmed in previous studies [3]. Importantly, Numb-like mRNA levels are 5-10-fold lower than Numb mRNA, and NumbL protein is undetectable in healthy adult skeletal muscle by Western. The physiology data reported in this manuscript supports the conclusion that a single KO of Numb is sufficient to recapitulate the physiological phenotype of Numb/Numb-like KO . We therefore reason that the majority, if not all, of the physiological contribution of these proteins to muscle contractility due to Numb (Fig. 1).

      4) The validation experiment used the same Numb antibody for immunoprecipitation, immunoblotted with Septin 7. A reciprocal IP of Septin 7 and blotted with Numb should be performed. In addition, a Numb-like IP or immunoblot would also be useful to demonstrate the specificity of the interaction. Efforts to map the interaction between Numb and Septin 7 would be useful to demonstrate specificity of the interaction and strategies to establish the biological relevance of the interaction.

      Response: We agree with the reviewer and attempted several IPs with anti-Septin7 antibodies. These were unsuccessful. In a new collaboration, Dr. Italo Cavini (University of Sao Paulo) has used machine-learning-based approaches to model binding between Numb and several septins, including Septin 7. The analysis suggests that binding of Numb with septins involves a domain of Numb that has not yet been ascribed a function in protein-protein interactions. These computational predictions require experimental validation but provide rational starting point for experiments to define the domains responsible for these interactions. Such experiments were included in our recent NIH R01 renewal application. We hope to be able to report on results of confirmatory experiments of these computational models in the future.

      5) Other septins were identified in the AP-MS experiment and might have been anticipated to also be disrupted by Numb/Numb-like deletion. Are these septins known to interact in a complex?

      Response: This is an excellent question. Septins have conserved motifs providing a clear reason to imagine that many different mammalian septins could directly interact with Numb. Septins form heterooligomers consisting of complexes formed by 3, 6 or 8 septins [4]. It is likely that when Numb binds to one septin, antibodies against Numb pull down other septins present in the septin oligomer to which Numb is bound. The following paragraph was added to the discussion: “Our findings suggest that Numb may also interact with other septins such as septins 2, 9 and 10, which were also identified with a high level of confidence as Numb interacting proteins by our LC/MS/MS analysis. Our data to not allow us to determine if Numb binds directly to these septins. Septins contain highly conserved regions, and, consequently, if one such region of septin 7 interacts with Numb, then many septins would be expected to directly bind Numb through the same domain. However, because septins self-oligomerize, is possible that when Numb binds to one septin, antibodies against Numb could also pull down other septins present in the septin oligomer to which Numb is bound regardless of whether or not they are also bound by Numb. “

      6) The text for Figure 5 describes analysis of Septin localization in inducible Numb/Numb-like cKO muscle, but the figure indicates only Numb is knocked out. Please clarify.

      Response: We apologize for this oversight on our part. The Legend to Figure 5 has been corrected.

      7) Supplementary Figure 2 seems to show that TAM treatment increases Numb expression. Please clarify. Also, please correct reference 9.

      Response: The figure was incorrectly labeled. We apologize for this oversight and have corrected the figure in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      Overall, the manuscript is well written. I do have a few minor issues/concerns, which are detailed below.

      Abstract: Please be a little more specific regarding which where the tissue came from (i.e. humans, mice, cell) when referring to your previous studies.

      Response: The abstract has been revised as requested.

      Introduction: Please be more specific regarding the technique used for detecting ultrastructural changes. I assume it was done with TEM, but the reference is listed as an "invalid citation" in your reference list.

      Response: The introduction was revised as requested and the citation was updated to reference a valid citation.

      Methods / Numb Co-Immunoprecipitation: Please indicated the level of confluency of the C2C12 cells as this will alter gene expression.

      Response: As indicated in the updated Methods section, confluent C2C12 cells were switched to differentiation media (low serum) for seven days. When harvested, the cells had differentiated and fused into myotubes.

      Methods / Immunohistochemical Staining: The first sentence needs to be edited regarding plurality and grammar.

      Response: Thank you for this comment. The text was revised accordingly.

      Results / GWAS and WGS Identify...: Please spell out phosphodiesterase (I assume) for PDE4D

      Response: This change was incorporated in the text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reports jAspSnFR3, a biosensor that enables high spatiotemporal resolution of aspartate levels in living cells. To develop this sensor, the authors used a structurally guided amino acid substitution in a glutamate/aspartate periplasmic binding protein to switch its specificity towards aspartate. The in vitro and in cellulo functional characterization of the biosensor is convincing, but evidence of the sensor's effectiveness in detecting small perturbations of aspartate levels and information on its behavior in response to acute aspartate elevations in the cytosol are still lacking.

      We thank the reviewers and editors for the detailed assessment of our work and for their constructive feedback. Most comments have now been experimentally addressed in the revised manuscript, which we feel is substantially improved from the initial draft.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Davidsen and coworkers describe the development of a novel aspartate biosensor jAspSNFR3. This collaborative work supports and complements what was reported in a recent preprint by Hellweg et al., (bioRxiv; doi: 10.1101/2023.05.04.537313). In both studies, the newly engineered aspartate sensor was developed from the same glutamate biosensor previously developed by the authors of this manuscript. This coincidence is not casual but is the result of the need to find tools capable of measuring aspartate levels in vivo. Therefore, it is undoubtedly a relevant and timely work carried out by groups experienced in aspartate metabolism and in the generation of metabolite biosensors.

      Reviewer #2 (Public Review):

      In this work the IGluSnFR3 sensor, recently developed by Marvin et al (2023) is mutated position S72, which was previously reported to switch the specificity from Glu to Asp. They made 3 mutations at this position, selected a S72P mutant, then made a second mutation at S27 to generate an Asp-specific version of the sensor. This was then characterized thoroughly and used on some test experiments, where it was shown to detect and allow visualization of aspartate concentration changes over time. It is an incremental advance on the iGluSnFR3 study, where 2 predictable mutations are used to generate a sensor that works on a close analog of Glu, Asp. It is shown to have utility and will be useful in the field of Asp-mediated biological effects.

      Reviewer #3 (Public Review):

      In this manuscript, Davidsen and collaborators introduce jAspSnFR3, a new version of aspartate biosensor derived from iGluSnFR3, that allows monitoring in real-time aspartate levels in cultured cells. A selective amino acids substitution was applied in a key region of the template to switch its specificity from glutamate to aspartate. The jAspSnFR3 does not respond to other tested metabolites and performs well, is not toxic for cultured cells, and is not affected by temperature ensuring the possibility of using this tool in tissues physiologically more relevant. The high affinity for aspartate (KD=50 uM) allowed the authors to measure fluctuations of this amino acid in the physiological range. Different strategies were used to bring aspartate to the minimal level. Finally, the authors used jAspSnFR3 to estimate the intracellular aspartate concentration. One of the highlights of the manuscript was a treatment with asparagine during glutamine starvation. Although didn't corroborate the essentiality of asparagine in glutamine depletion, the measurement of aspartate during this supplementation is a glimpse of how useful this sensor can be.

      Reviewer #1 (Recommendations For The Authors):

      The authors should evaluate the effectiveness of the sensor in detecting small perturbations of aspartate levels and its behavior in response to acute aspartate elevations in the cytosol. In vivo aspartate determinations were performed exclusively in conditions that cause aspartate depletion. By means the use of mitochondrial respiratory inhibitors or aspartate withdrawal, it was determined the reliability of the sensor performing readings during relatively long periods, until reaching a steady-state of aspartate-depletion 12-60 hours later. Although in Hellweg and coworkers, it has been demonstrated that a related aspartate sensor could detect increases in aspartate in cell overexpressing the aspartate-glutamate GLAST transporter, the differences reported here between both sensors advise testing whether this aspect is also improved, or not, using jAspSNFR3.

      Similarly, Davidsen et al. did not test if the sensor can be able to detect transient variations in cytosolic aspartate levels. In proliferative cells aspartate synthesis is linked to NAD+ regeneration by ETC (Sullivan et al., 2015, Cell), indeed the authors deplete aspartate using CI or CIII inhibitors but do not analyze if those are recovered, and increased, after its removal. Furthermore, the sequential addition of oligomycin and uncouplers could generate measurable fluctuations of aspartate in the cytosol.

      We agree with the reviewer that only including situations of aspartate depletion in our cell culture experiments provided an incomplete evaluation of the utility of this biosensor. In the revised manuscript we provide three additional experiments using secondary treatments that restore aspartate synthesis to conditions that initially caused aspartate depletion. First, we conducted experiments where cells expressing jAspSnFR3/NucRFP were changed into media without glutamine, inducing aspartate depletion, with glutamine being replenished at various time points to observe if GFP/RFP measurements recover. As expected, glutamine withdrawal caused a decay in the GFP/RFP signal and we found that restoring glutamine caused a subsequent restoration of the GFP/RFP signal at all time points, with each fully recovering the GFP/RFP signal over time (Revised Manuscript Figure 2E). Next, we conducted the experiment suggested by the reviewer, testing whether the published finding, that oligomycin induced aspartate limitation can be remedied by co-treatment with electron transport chain uncouplers, could be visualized using jAspSnFR3 measurements of GFP/RFP. Indeed, after 24 hours of oligomycin induced aspartate depletion, treatment with the ETC uncoupler BAM15 dose dependently restored GFP/RFP signal (Revised Manuscript Figure 2G). Finally, we also measured whether the ability of pyruvate to mitigate the decrease in aspartate upon co-treated with rotenone (Figure 2B) could also be detected in a sequential treatment protocol after aspartate depletion. Indeed, after 24 hours of aspartate depletion by rotenone treatment, the GFP/RFP signal was rapidly restored by additional treatment with pyruvate (Revised Manuscript Figure 2, figure supplement 1C). Collectively, these results provide support for the utility of jAspSnFR3 to measure transient changes in aspartate levels in diverse metabolic situations, including conditions that restore aspartate to cells that had been experiencing aspartate depletion.

      Reviewer #2 (Recommendations For The Authors):

      Weaknesses: Sensor basically identical to iGluSnFR3, but nevertheless useful and specific. The results support the conclusions, and the paper is very straightforward. I think the work will be useful to people working on the effects of free aspartate in biology and given it is basically iGluSnFR3, which is widely used, should be very reproducible and reliable.

      We appreciate the reviewer’s comment that sensor is useful for specific detection of aspartate. We agree that the advance of the paper is primarily in demonstrating its utility to measure aspartate, rather than any fundamental innovation on the biosensor approach. We hope the fact that jAspSnFR3 derives from a well validated biosensor (iGluSnFR3) will support its adoption.

      Reviewer #3 (Recommendations For The Authors):

      Although this is a well-performed study, I have some comments for the authors to address:

      1) A red tag version of the sensor (jAspSnFR3-mRuby3) was generated for normalization purposes, with this the authors plan to correct GFP signal from expression and movement artifacts. I naturally interpret "movement artifacts" as those generated by variations in cell volume and focal plane during time-lapse experiments. However, it was mentioned that jAspSnFR3-mRuby3 included a histidine tag that may induce a non-specific effect (responses to the treatment with some amino acids). This suggests that a version without the tag needs to be generated and that an alternative design needs to be set for normalization purposes. A nuclear-localized RFP was expressed in a second attempt to incorporate RFP as a normalization signal. Here the cell lines that express both signals (sensor and RFP) were generated by independent lentiviral transductions (insertions). Unless the number of insertions for each construct is known, this approach will not ensure an equimolar expression of both proteins (sensor and RFP). In this scenario is not clear how the nuclear expression of RFP will help the correction by expression or monitor changes in cell volume. The authors may be interested in attempting a bicistronic system to express both the sensor and RFP.

      The reviewer noted several potential issues concerning the use of RFP for normalization, which will be separated into sections below:

      Movement artifacts:

      We are glad the reviewer raised this issue since we see how it was confusingly worded. We have deleted the text “and movement artefacts” from the sentence.

      His-tag and non-specific responses to some amino acids:

      We also found it concerning that non-specific responses to amino acids could potentially contribute to our RFP normalization signal, and so we conducted additional experiments to address whether this was likely to be an issue in intracellular measurements. We first tested whether the non-specific signal was related to the histidine tag, or was intrinsic to the mRuby3 protein itself, by comparing the fluorescence response to a titration of histidine (which showed the largest effect of red fluorescence), aspartate, and GABA (structurally related to glutamate and aspartate, but lacking a carboxylate group) across a group of mRuby containing variants, with or without histidine tags. We replicated the non-specific signal originally observed in jAspSnFR3-mRuby3-His and found that another biosensor with a histidine tagged on the C terminus of mRuby3 had a similar response (iGlucoSnFR2.mRuby3-His), as did mRuby3-His alone, indicating that the aspect of being fused with jAspSnFR3 or another binding protein was not required for this effect. Additionally, we also compared the fluorescence response of lysates expressing mRuby2 and mRuby3 without histidine tags and found that the non-specific signal was essentially absent (Revised Manuscript Figure 1, figure supplement 4B-D). Collectively. These data support our original hypothesis that the histidine tag was responsible for the non-specific signal, alleviating concerns about more substantial protein design issues or with using nuc-RFP for normalization. Since we also found that measuring aspartate signal using GFP/RFP ratios from cells with linked the jAspSnFR3-Ruby3-His agreed with measurements from cells separately expressing jAspSnFR3 and nucRFP (without a His tag), and the amino acid concentrations needed to significantly alter His tagged Ruby3 signal are above those typically found in cells, we conclude that this is unlikely to be a significant factor in cells. Nonetheless, we have added all the relevant data to the manuscript to allow readers to make their own decision about which construct would be best for their purposes.

      Original text:

      "Surprisingly, the mRuby3 component responds to some amino acids at high millimolar concentrations, indicating a non-specific effect, potentially interactions with the C-terminal histidine tag (Figure 1—figure Supplement 2, panel B). Notably, this increase in fluorescence is still an order of magnitude lower than the green fluorescence response and it occurs at amino acid concentrations that are unlikely to be achieved in most cell types."

      Revised text:

      "Surprisingly, the mRuby3 fluorescence of affinity-purified jAspSnFR3.mRuby3 responds to some amino acids at high millimolar concentrations, indicating a non-specific effect (Figure 1—figure Supplement 4, panel A). This was determined to be due to an unexpected interaction with the C-terminal histidine tag and could be reproduced with other proteins containing mRuby3 and purified via the same C-terminal histidine tag (Figure 1—figure Supplement 4, panel B and C). Interestingly, a structurally related, non-amino acid compound, GABA, does not elicit a change in red fluorescence; indicating, that only amino acids are interacting with the histidine tag (Figure 1—figure Supplement 4, panel D). Nevertheless, most of our cell culture experiments were performed with nuclear localized mRuby2, which lacks a C-terminal histidine tag, and these measurements correlated with those using the histidine tagged jAspSnFR3-mRuby3 construct (Figure 1—figure Supplement 1 panel D)."

      Lentiviral transductions

      We agree that splitting the two fluorescent proteins across two expression constructs and infections effectively guarantees that there will not be equimolar expression of jAspSnFR3 and RFP, however we do not think equimolar expression is necessary in this context. The primary goal of RFP measurements in these experiments (and in experiments using the jAspSnFR3-mRuby3 fused construct) is to control for global alterations in protein expression that might confound the interpretation that a change in GFP fluorescence corresponds to a change in aspartate levels. While a bicistronic system is arguably a better approach to improve the similarity of expression of jAspSnFR3 and nuc-RFP in a cell, we only require that the cells have consistent expression of both proteins across all cells in the population, not that the expression of one necessarily be a similar molarity to the other. We accomplish consistent expression of proteins by single cell cloning after expression of jAspSnFR3 and nucRFP (or jAspSnFR3-mRuby3), and screening for clones that have high enough expression of both proteins such that they are well detected by standard Incucyte conditions. Given that our data do not identify an obvious downside to separate expression of jASPSnFR3 and nuc-RFP compared to the fused jAspSnFR3-mRuby3 construct (where the fluorescent proteins are truly equimolar) (Figure 2, Figure Supplement 1C), we elected to prioritize the separate jAspSnFR3 and nuc-RFP combination, which provides additional opportunities to measure cell number in the same experiment (see below).

      2) The authors were interested in establishing the temporal dynamics of aspartate depletion by genetics and pharmaceutical means. For the inhibition of mitochondrial complex I rotenone and metformin were used. Although the assays are clearly showing aspartate depletion the report of cell viability is missing. Considering that glutamine deprivation induces arrest in cell proliferation, I think will be important to know the conditions of the cell cultures after 60 hours of treatment with such inhibitors.

      We agree that ensuring that cells are still viable in conditions where aspartate is depleted, as determined by GFP/RFP in jAspSnFR3 expressing cells, is an important goal. To this end, we added a new experiment investigating the restoration of glutamine on the GFP/RFP signal at different time points after glutamine depletion (Revised Manuscript Figure 2E, see response to reviewer 1). One advantage of using the nuclear RFP as a normalization marker is that it also enables measurements of nuclei counts, a surrogate measurement for cell number. In the same glutamine depletion experiment we therefore measured cell counts using nuclear RFP incidences and confluency as measurements of cell proliferation/growth. In both cases, the arrest in cell proliferation upon glutamine withdrawal was obvious, as was the restoration of cell proliferation following glutamine replenishment, with the amount of growth delay corresponding to the length of glutamine withdrawal (Revised Manuscript Figure 2, Figure Supplement 2A-B). Nonetheless, there was no obvious lasting defects in restarting cell proliferation even after 12 hours of glutamine withdrawal, indicating that cell viability is preserved. In the case of mitochondrial inhibitors, we also observe even that after 24 hours of treatment with oligomycin or rotenone, restoration of aspartate synthesis from BAM15 or pyruvate, respectively, can also restore GFP/RFP signal, supporting the conclusion that cellular metabolism is still active in these conditions (Revised Manuscript Figure 2G; Revised Manuscript Figure 2, figure supplement 1C).

      3) The pH sensitivity was checked in vitro with jAspSnFR3-mRuby3 and the sensor reported suitable for measurements at physiological pH. It would be an opportunity to revisit the analysis for pH sensitivity in cultured cells using an untagged version of jAspSnFR3 coupled, for example, to a sensor for pH.

      We thank the reviewer for the suggestion and agree that pH effects on sensor signal could be a confounding factor in some conditions. Unfortunately, measuring intracellular pH is not trivial and using multiple fluorescent sensors that change simultaneously would be complex to interpret, particularly in the absence of controls to unambiguously control intracellular pH and aspartate concentrations. Thus, we believe that proper investigation of the variable of pH is beyond the scope of this study. Nonetheless, we agree that measuring the contribution of pH to sensor signal is an important goal for future work, particularly if deploying it in conditions likely to cause substantial pH differences, such as comparing compartmentalized signal of jAspSnFR3 in the cytosol and mitochondria. We have added the following italicized text to the conclusions section to underscore this point:

      “Another potential use for this sensor would be to dissect compartmentalized metabolism, with mitochondria being a critical target, although incorporating the influence of pH on sensor fluorescence will be an important consideration in this context.”

      4) While the authors take an interesting approach to measuring intracellular aspartate concentration, it will be highly desirable if a calibration protocol can be designed for this sensor. Clearly, glutamine depletion grants a minimal ("zero") aspartate concentration. However, having a more dynamic way for calibration will facilitate the introduction of this tool for metabolism studies. This may be achieved by incorporating a cultured cell that already expresses the transporter or by ectopic expression in the cells that have already been used.

      We appreciate the suggestion and would similarly desire a calibration protocol to serve as a quantitative readout of aspartate levels from fluorescence signal, if possible. While we do calibrate jAspSnFR3 fluorescence in purified settings, conducting an analogous experiment intracellularly is currently difficult, if not impossible. While we have several methods to constrain the production rate of aspartate (glutamine withdrawal, mitochondrial inhibitors, and genetic knockouts of GOT1 and GOT2), we cannot prevent cells from decreasing aspartate consumption and so cannot get a true intracellular zero to aid in calibration. Additionally, the impermeability of aspartate to cell membranes makes it challenging to specifically control intracellular concentrations using environmental aspartate, and the best-known aspartate transporter (SLC1A3) is concentrative and so has the reciprocal problem. Considering these issues, we are wary of implying to readers that any specific fluorescence measurement can be used to directly interpret aspartate concentration given the many variables that can impact its signal, both related to the biosensor system itself (expression of jAspSnFR3, expression of Nuc-RFP, sensitivity and settings of the fluorescence detector) and based on cell intrinsic variability (differences in basal ASP levels, different sensitivity to treatments, influence of pH, etc.). We maintain that jAspSnFR3 has utility to measure relative changes in aspartate within a cell line across treatment conditions and over time, but absolute quantitation of aspartate still will require complementary approaches, like mass spectrometry, enzymatic assays, or NMR.

      5) jAspSnFR3 seems to have the potential to be incorporated easily for several research groups as a main tool. In general, a minor correction to replace F/F with ΔF/F in the text.

      Thank you for catching this error, the text has been edited accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors provide evidence to show that an increase in Kv7 channels in hilar mossy cells of Fmr1 knock out mice results in a marked decrease in their excitability. The reduction in excitatory drive onto local hilar interneurons produces an increased excitation/inhibition ratio in granule cells. Inhibiting Kv7 channels can help normalize the excitatory drive in this circuit, suggesting that they may represent a viable target for targeted therapeutics for fragile-x syndrome.

      Strengths:

      The work is supported by a compelling and thorough set of electrophysiological studies. The authors do an excellent job of analysing their data and present a very complete data set.

      We thank the Reviewer for the positive comments.

      Weaknesses:

      There are no significant weaknesses in the experimental work, however the complexity of the data presentation and the lack of a schematic showing the organizational framework of this circuit make the data less accessible to non-experts in the field. I highly encourage a graphical abstract and network diagram to help individuals understand the implications of this work.

      We thank the Reviewer for the suggestion, and added a schematic of the dentate network organization (Figure 1A).

      The work is important as it identifies a unique regional and cell-specific abnormality in Fmr1 KO mice, showing how the loss of one gene can result in region-specific changes in brain circuits.

      Reviewer #2 (Public Review):

      Summary:

      Deng et al. investigate, for the first time to my knowledge, the role that hippocampal dentate gyrus mossy cells play in Fragile X Syndrome. They provide strong evidence that, in slice preparations from Fmr1 knockout mice, mossy cells are hypoactive due to increased Kv7 function whereas granule cells are hyperactive compared to slices from wild-type mice. They provide indirect evidence that the weakness of mossy cell-interneuron connections contributes to granule cell hyperexcitability, despite converse adaptations to mossy cell inputs. The authors show that application of the Kv7 inhibitor XE991 is able to rescue granule cell hyperexcitability back to wild-type baseline, supporting the overall conclusion that inhibition of Kv7 in the dentate may be a potential therapeutic approach for Fragile X Syndrome. However, any claims regarding specific circuit-based intervention or analysis are limited by the exclusively pharmacological approach of the manipulations.

      Strengths:

      Thorough electrophysiological characterization of mossy cells in Fmr1 knockout mice, a novel finding.

      Their electrophysiological approach is quite rigorous: patched different neuron types (GC, MC, INs) one at a time within the dentate gyrus in FMR1 KO and WT, with and without 'circuit blockade' by pharmacologically inhibiting neurotransmission. This allows the most detailed characterization possible of passive membrane/intrinsic cell differences in the dentate gyrus of Fmr1 knockout mice.

      Provide several examples showing the use of Kv7 inhibitor XE991 is able to rescue excitability of granule cell circuit in Fmr1 knockout mice (AP firing in the intact circuit, postsynaptic current recordings, theta-gamma coupling stimulation).

      We thank the Reviewer for the positive comments.

      Weaknesses:

      The implications for these findings and the applicability of the potential treatment for the disorder in a whole animal are limited due to the fact that all experiments were done in slices.

      We appreciate the Reviewer’s point and agree. To address this concern, we have revised the Discussion to state that “the applicability of a circuit-wide approach as a potential treatment in vivo will require extensive future behavioral analyses, which are beyond the scope of the current study”. We also now emphasize in Discussion that “these findings provide a proof-of-principle demonstration that a circuit-based intervention can normalize dynamic E/I balance and restore dentate circuit output in vitro”.

      The authors' interpretation of the word 'circuit-based' is problematic - there are no truly circuit-specific manipulations in this study due to the reliance on pharmacology for their manipulations. While the application of the Kv7 inhibitor may have a predominant effect on the circuit through changes to mossy cell excitability, this manipulation would affect many other cells within the dentate and adjacent brain regions that connect to the dentate that express Kv7 as well.

      We appreciate the reviewer’s point but would like to clarify that by using a term “circuit-based” we did not intend to imply that it is a “’circuit-specific” intervention. Our intended interpretation of the term ‘circuit-based’ stems from the following reasoning: the dentate circuit has two types of excitatory neurons which show opposite excitability defects in FXS mice, thus presenting an irreconcilable conflict to correct pharmacologically for each cell type individually. Instead, we sought an approach to correct the overall dentate circuit output, rather than to restore excitability defects of individual cell types. Notably, when we pharmacologically isolated granule cells from the circuit, inhibition of Kv7 failed to restore their excitability, suggesting that normalization of the dentate output depends on the circuit activity. Since we focused on correcting dentate output using such a circuit-dependent approach, we used the term ‘circuit-based intervention’ to emphasize this notion.

      Reviewer #3 (Public Review):

      The paper by Deng, Kumar, Cavalli, Klyachko describes that, unlike in other cell types, loss of Fmr1 decreases the excitability of hippocampal mossy cells due to up-regulation of Kv7 currents. They also show evidence that while muting mossy cells appears to be a compensatory mechanism, it contributes to the higher activity of the dentate gyrus, because the removal of mossy cell output alleviates the inhibition of dentate principal cells. This may be important for the patho-mechanism in Fragile X syndrome caused by the loss of Fmr1.

      These experiments were carefully designed, and the results are presented ‎in a very logical, insightful, and self-explanatory way. Therefore, this paper represents strong evidence for the claims of the authors. In the current state of the manuscript, there are only a few points that need additional explanation.

      We thank the Reviewer for the positive comments.

      One of the results, which is shown in the supplementary dataset, does not fit the main conclusions. Changes in the mEPSC frequency suggest that in addition to the proposed network effects, there are additional changes in the synaptic machinery or synapse number that are independent of the actual activity of the neurons. Since the differences of the mEPSC and sEPSC frequencies are similar and because only the latter can signal network effects, while the former is typically interpreted as a presynaptic change, it cannot be claimed that sEPSC frequency changes are due to the hypo-excitability of mossy cells.

      We thank the Reviewer for this important point and agree. To address this concern, we now state in Results that “We note that changes in the excitatory drive onto interneurons include both mEPSC and sEPSC frequencies, which reflect not only potential deficits in excitability of their input cells, such as MCs, but also changes in synaptic connectivity/function, that may arise from homeostatic circuit reorganization/compensation (see Discussion)”.

      We also now emphasize this point in Discussion by stating that “alterations in excitatory drives, including both mEPSC and sEPSC frequencies onto interneurons, suggest changes in the excitatory synapse number and/or function. Together with alterations in inhibitory drives these changes may reflect compensatory circuit reorganization of both excitatory and inhibitory connections, including mossy cell synapses”.

      We also note in Discussion that “Such circuit reorganization can explain the balanced E/I drive onto granule cells in Fmr1 KO mice we observed in the basal state, which can result from reorganization of excitatory and inhibitory axonal terminals”.

      Notably, our findings that Kv7 blocker acting by increasing MC excitability is sufficient to correct dentate output, supports the notion that hypo-excitability of mossy cells is a major factor contributing to dentate circuit E/I imbalance. This does not exclude the presence of additional mechanisms contributing to E/I imbalance, such as changes of synaptic connectivity or release machinery. To reflect this point, we revised the Results to temper the initial claim that “this analysis supports the notion that the hypo-excitability of MCs in Fmr1 KO mice caused (now replaced with “is a major factor contributing to”) the reduction of excitatory drive onto hilar interneurons, which ultimately results in reduced local inhibition”.

      An apparent technical issue may imply a second weak point in the interpretation of the results. Because the IPSCs in the PP stimulation experiments (Fig 8) start within a few milliseconds, it is unlikely that its first ‎components originate from the PP-GC-MC-IN feedforward inhibitory circuit. The involvement of this circuit and MCs in the Kv7-dependent excitability changes is the main implication of the results of this paper. But this feedforward inhibition requires three consecutive synaptic steps and EPSP-AP couplings, each of them lasting for at least 1ms + 2-5ms. Therefore, the inhibition via the PP-GC-MC-IN circuit can be only seen from 10-20ms after PP stimulation. The earlier components of the cPSCs should originate from other circuit elements that are not related to the rest of the paper. Therefore, more isolated measurements on the cPSC recordings are needed ‎which consider only the later phase of the IPSCs. This can be either a measurement of the decay phase or a pharmacological manipulation that selectively enhances/inhibits a specific component of the proposed circuit.

      We appreciate the Reviewer’s point. As we mentioned in Results: “The EPSP measured in granule cells in response to the PP stimulation integrates both excitatory and inhibitory synaptic inputs onto granule cells, including the direct synaptic input from the PP and all the PP stimulation-associated feedforward and feedback synaptic inputs. In other words, the EPSP in granule cells integrates all dentate circuit ‘operations’.” As the Reviewer pointed out, this is also the case in the measurements of cPSCs, which comprise all of PP stimulation-associated feedforward and feedback inhibition. We thank the Reviewer for the suggestion to isolate specific components of IPSC. However, we did not attempt to do it in this study for three reasons. First, activity of all of these circuit components likely overlaps extensively in time and it is difficult to identify the specific time point that can separate contributions from earlier canonical feed-forward and feed-back components from the contribution of the later MC-dependent PP-GC-MC-IN feed-forward component. Notably the tri-synapse PP-GC-MC-IN component differs temporarily from the canonical di-synaptic (PP-GC-IN) feed-back inhibition only by a single synaptic activation step, resulting in only a few milliseconds difference. Moreover, the temporal differences in the contributions of these components vary widely among different recordings making a uniform analysis very difficult. Second, we used three different metrics to assess E/I changes in cPSC measurements, which capture a wide range of temporal processes and their integration, including peak-to-peak measurements, the charge transfer, and the excitation window metrics. Third, the principal readout in our study was the overall dentate output (i.e., granule cell firing), which reflects the integration of all dentate circuit ‘operations’ thus making the overall cPSC measurements appropriate, in our view, for this readout.

      I suggest refraining from the conclusions saying "‎MCs provide at least ~51% of the excitatory drive onto interneurons in WT and ~41% in KO mice", because too many factors (eg. IN cell types, slice condition, synaptic reliability) are not accounted for in these actual numbers, and these values are not necessary for the general observation of the paper.

      We thank the reviewer for this suggestion, and have revised the manuscript accordingly.

      There are additional minor issues about the presentation of the results.

      We have carefully checked and corrected the minor errors that reviewer pointed out.

      Recommendations for the authors:

      Revisions that are considered essential for improved assessment regarding the strengths of support of the claims:

      • Temper claims regarding circuit-based effects

      • Temper claims regarding very specific quantitative assessments of synaptic drives

      • Differentiate between monosynaptic inputs and inputs arriving through multiple synaptic contacts with proper analytical techniques.

      We appreciate these suggestions and have revised the manuscript to address the concerns raised by the reviewers.

      Reviewer #1 (Recommendations For The Authors):

      The authors do an outstanding job of reviewing and presenting all of their data. This is a paper I will recommend all of my trainees read, as it is an excellent example of a complete research project. While I am impressed with the effort involved, I also wondered if the complexity and thoroughness of their presentations could make the story less accessible to non-expert readers. My comments are simply intended to help them present a more coherent and succinct story to a wider audience, though I am not sure I really provide any meaningful changes. This is simply a very thorough and complete body of work that the authors should be commended for. After reading it I felt they had gone above and beyond what most authors would provide in terms of data to support their story, and thus I had no doubt that a change in Kv7 plays a role in changing the excitability of the network.

      We thank the Reviewer for the positive comments and great suggestions. We have made numerous changes to present our work in a more coherent and succinct way, in part by re-plotting some of the figures, as well as by adding a schematic of the dentate circuit in Figure 1.

      Figure 1. A visual of mossy cells and the local circuit they are studying would be a useful addition to Figure. 1. I also feel this is important for conveying the story of how hypo-excitability can impact the E/I of the network. I think it has to be more of a cell structure/circuit-based figure than is presented in Supplementary Figure 8.

      We thank the reviewer for this suggestion. We have added a schematic of the dentate circuit with all major cell types involved in Figure 1A.

      Figure 1. A, B, and C tell a coherent story and are easy to understand. The interpretation of the phase plot in D is harder to access. Perhaps having this as a separate figure and providing a clearer presentation of the way the phaseplot was created (see Figure 3 Bove et al., 2019, Neuroscience 418; DOI: 10.1016/j.neuroscience.2019.08.048)

      We appreciate the Reviewer’s point and agree. In order to keep Figure 1 more concise and readable, we removed the phase plot in the revised version. This change did not negatively impact the result presentation because the primary aim of this plot was to visualize changes in voltage threshold in an alternative way, but it was already clearly shown by the ramp-evoked AP traces (revised Figure 1D, insert), and thus was not essential to show.

      Figure 1 E-N might be better situated in a supplementary graph as the characteristics of the AP aren't changing.

      We understand the Reviewer’s point, but we feel it would be better to keep all action potential metrics together in one figure, to show that only a specific subset of parameters was affected in Fmr1 KO mice.

      Figure 2: (A-D) I am not sure having so many figures is required given the focus is on having a small change in Ir at one membrane potential. I do worry that the significance appears to be due to 2 cells with an IR of over 100 in the WT group and 2 with an IR of around 62 in the KO group. All other cells are between 75-100 in both groups. I also worry a bit bc in the literature IRs between 55 and 125 seem to be commonly reported by groups that do this work normally (Buzsacki, Westbrook, etc.). I would be cautious about making too much out of this result.

      We thank the Reviewer for these comments. We have performed additional analyses of these data, as also suggested by Reviewer 3 (Point #1), and improved presentation of the data in Figure 2D-F by showing the effect of XE991 on increasing input resistance in WT vs KO. We also plotted other panels in a similar way to show the comparisons between WT and KO, as well as comparisons within genotype +/- XE991, which makes the results easy to follow. For more details, please also see the response to Reviewer 3, Point 1.

      Figure 2D-E: As in the text, this result is really pointing towards there being a Kv7 issue. Worries about the data in D aside, I think these two figures alone tell a clearer story. Figure 3 on the other hand tells a story of the effects of blocking Kv7 on membrane potential. Is this central to the story the others are trying to tell?

      We thank the reviewer for this point. We believe that Figure 2, Figure 3 and Figure 4—figure supplement 1 together provide strong and multifaceted evidence to support changes in Kv7 function in Fmr1 KO mossy cells.

      Figure 3. This is an interesting finding that shows how detailed their analysis was. Showing that the change in holding current in KO animals is greater than in WT is the first solid piece of evidence that there is a change in Kv7 in these cells that affects their excitability.

      We appreciate the reviewer’s comment. As mentioned above, we believe that Figure 2, Figure 3 and Figure 4—figure supplement 1 together provide strong and multifaceted evidence to support changes in Kv7 function in Fmr1 KO mossy cells.

      Figures 4 and 5 provide additional detail to support the idea that Kv& changes by showing how the E/I ratio and spontaneous minis are shifted in KO animals.

      We thank the Reviewer for the comments.

      Figures 6-8 build a compelling story for the reduction in excitatory drive in mossy cells affecting the network dynamics in excitatory/inhibitory interactions in DG cells.

      We appreciate the Reviewer’s comment.

      Reviewer #2 (Recommendations For The Authors):

      1) Other than location and characteristic morphology, the other parameters that were used to identify mossy cells and granule cells were also parameters used to find differences in cellular properties between wild-type and Fmr1 KO mice (RMP, sEPSC frequency, etc.), which would confound the results shown. The use of available transgenic mouse lines would provide for a more unbiased screen of these cells. Afterhyperpolarization was also used as a parameter while screening cells, yet none of the data on this measurement is shown.

      We thank the reviewer for this point and agree that transgenic mouse lines provide a more unbiased way to identify various types of neurons. However, since the present study involves analyses of at least three different types of neurons, establishing multiple transgenic lines labeling different types of dentate neurons in the Fmr1 KO mouse model would be very time consuming and beyond the current resources of the lab. We would also like to clarify that the three types of dentate neurons are easily distinguished according to the large differences in location, morphology and basal electrophysiological properties, none of which were essential in defining differences between genotypes. Specifically, granule cells are located in the granule cell layer, have a small cell body (<10 m), RMP around -80mV, capacitance ~20 pF, and infrequent sEPSCs (<20 events/min); mossy cells are located in the hilus, have a large cell body (>15 m), RMP around -65 mV, capacitance >100 pF, and fast afterhyperpolarization less than -10 mV (WT –5.1 ± 0.7 mV, KO -5.8 ± 0.5 mV); interneurons are located in the hilus or border of granule cell layer, have a relative smaller cell body (10-15 m), RMP around -55 mV, capacitance <60 pF, and afterhyperpolarization larger than -15 mV (WT -20.4 ± 1.3 mV, KO -19.8 ±1.4 mV). We note that the cells that could not be definitively classified into the three categories were not included in analyses, and we have now clarified this further in the Methods. To address the reviewer’s second concern regarding AHP, we now provided the corresponding values in the Methods.

      2) A definitive way to test the cell-autonomous nature of the Kv7 changes would be to use female mice, who will have a mosaic of cells affected by the fragile X chromosome, and the Fmr1 KO cells could be engineered to express GFP to help identify them from wild-type cells.

      We agree and appreciate this suggestion. This could be an interesting follow up study to further verify the cell-autonomous nature of Kv7 changes.

      3) The authors heavily rely on XE991 as a selective Kv7 blocker. Is it blocking all Kv7 channels at the concentration used? If so, given the significant expression of Kv7 in the dentate as shown by Western blot, is it surprising that there is no effect of this inhibitor on wild-type slices in most cases?

      We thank the reviewer for this important point. We used 10x of IC50 concentration in the present study, suggesting that more than 80% of Kv7 should be blocked. Notably, we observed several effects of XE991 in WT mice: it significantly increased input resistance (new Figure 2D-F), and strongly enhanced AP firing evoked by step depolarization (Figure 7E-H), although we did not observe effect of XE991 in WT in the analyses of spiking evoked by theta-gamma stimulation in Figure 8. However, this is not surprising. If a parameter we measured is predominately cell-autonomous (for example, input resistance), the effects of XE991 are easy to observe. However, if a parameter reflects integration of all dentate circuit operations (for example, AP probability in response to theta-gamma stimulation), it is difficult to detect the effect of XE991 in WT mice because the dentate circuit of WT mice has larger capability to maintain E/I balance in response to XE991.

      4) E/I ratio is a helpful concept, and it is heavily relied upon in the results text, but statistically shaky, especially for sEPSC:sIPSCs since you are combining uncertainty in the sEPSC and sIPSC to make one very uncertain ratio that doesn't undergo any subsequent statistical confirmation (such as in Fig 4I).

      We appreciate the reviewer’s point and apologize for the confusion in presentation of Fig 4I (and 5I), due to lack of detailed explanation. The E/I ratio shown in Figs. 4I (and 5I) is a single data-point estimate calculated from the mean values of independent sEPSC and sIPSC measurements (Figs. 4G-H and 5G-H, respectively). This ratio was used only as an estimate/illustration of the changes, rather than a precise determination of the shift in E/I balance. Because there is only one data-point for this ratio, statistical analysis is not possible. For this reason we performed extensive additional analyses in Figures 7 and 8, in which the EPSC and IPSC were measured from the same cells and at the same time to define the actual E/I ratio with the corresponding statistical analyses (i.e., a real matched and dynamic E/I ratio).

      5) Is this mGlur2/CB1 specificity to PP/granule and MC axons, respectively, true in the Fmr1 KO mice? It is possible that mGluR2 and CB1 expression patterns are altered in FMR1 KO, thus the assumption used to isolate these distinct inputs may not hold true.

      This is a very good point. We do assume that the specificity of Group II mGluR and CB1 is similar between Fmr1 KO and WT mice, but this is an assumption that we have not directly verified. However, our results in Figures 7 and 8 strongly support this assumption, because if it were not true, then our intervention would be unlikely to correct the excessive dentate output.

      6) XE991 only normalized GC firing when other cells were not pharmacologically blocked. The authors suggest this means blockage of MC Kv7 reduces GC excitability back to normal...presumably by increasing MC --> IN --> GC firing. This is a conclusion from many indirect comparisons (comparing XE991 effect on GC with/without GABA and glutamate blockers; comparing MC firing rates with/without XE991, and using CB1 agonist versus mGluR2 agonist to say it is mossy cells that are mostly controlling INs) - a clincher experiment would be to acutely knockdown Kv7 in mossy cells specifically and measure GC and IN firing.

      Thank you, this is a great suggestion. Indeed, as an expansion of this project, in the future studies we are planning to manipulate excitability of mossy cells through manipulating Kv7, or using chemogenetic or optogenetic approaches.

      7) The reasoning behind the FMRP-Kv7 connection is quite weak, citing the paper Darnell 2011 as "translational target", but FMRP has myriad translational targets.

      We agree, and attempted to define the mechanism of increased Kv7 function using co-immunoprecipitation approach, as well as immunostaining to look at cell-type specific expression changes. However, both of these approaches were difficult to interpret due to technical limitations of the available antibodies. We also note that “We did not further investigate the precise mechanisms underlying enhancement of Kv7 function in the absence of FMRP, since the present study primarily focuses on the functional consequences of abnormal cellular and circuit excitability”. To address this concern, we extensively discussed the potential mechanisms of FMRP-Kv7 connection, acknowledged in Discussion that “further studies will be needed to elucidate the precise mechanism responsible for the increased Kv7 function in Fmr1 KO mice”, and will continue to investigate it in the future studies.

      8) The authors attempt to look for changes in Kv7 expression with Western blot, but since they hypothesize that Kv7 changes are mainly in the mossy cells, it is perhaps not surprising that they would not be able to see any changes when they look at dentate as a whole. Staining for Kv7 subunits to look at expression on a cellular level would be beneficial.

      We appreciate the reviewer’s suggestion. We attempted to perform the suggested experiments using immunostaining for KCNQ2, KCNQ3 and KCNQ5 in different subtypes of dentate neurons. However, these experiments failed to produce interpretable results due to technical limitations of the available antibodies.

      9) Is Kv7 localization or splice/composition different in FMR1 KO mice?

      This is a very good point. As we mentioned in Point 8 above, we were not able to perform these experiments and do not have the answer at this point.

      10) Regarding the 3 subtypes of interneurons in the dentate, the authors are pooling data based on similar intrinsic properties, but this conclusion may be affected by the low number of recorded neurons for the regular-spiking type. In addition, it is unclear whether these different interneuron types have differential circuit connectivity (most likely) which would make it imperative to keep circuit analysis for interneurons segregated into these cell types.

      We appreciate the reviewer’s point. Indeed, these different interneuron types may have distinct circuit connectivity and contributions to circuit activity. However, identification of these 3 types of interneurons and determination of their respective functions is in itself a very extensive set of experiments which is beyond the scope of the current manuscript. We also note that the functional readout of circuit activity in our measurements was the AP firing and EPSPs evoked in granule cells by PP stimulation, which integrate all dentate circuit operations, including all of the feedforward and feedback loops which are mediated by all of these different types of interneurons. For simplicity, we thus pooled all interneuron data for the purposes of this study. But we fully agree that extensive future work is required to elucidate interneuron-type specific changes in Fmr1 KO mice and their contributions to the dentate circuit dysfunction.

      11) To do statistics treating each cell individually, and therefore assuming each cell is independent of one another, is not correct. Two cells from the same mouse will be more similar than two cells from different mice, therefore they are not independent data points. Nested statistical methods (n cells from o slices from p mice) will be important in future work, as discussed by (Aarts et al., Nat. Neurosci. 2014).

      We agree with the Reviewer’s point and appreciate this suggestion. In the present study, the cells tested in electrophysiological experiments were from at least 3 different mice for each condition, which help minimize this kind of errors.

      Reviewer #3 (Recommendations For The Authors):

      Is there a difference in the Rin at -45mV of the control cell after the application of XE991? This is important to appreciate whether the XE991-sensitive conductances contribute to the basal excitability of MCs. Furthermore, the statistical comparison of the Rin at -45mV of the FXS animals in the control solution and in the presence of XE991 would be also important‎. Actually, the most accurate measurement would be to show a difference in the acute Kv7-blockade between control and FXS animals, if that is possible with this blocker. Additionally, it would be also informative if the bar graphs in Fig.2 D & E were merged for this purpose, similarly as in the later figures.

      We thank the Reviewer for this suggestion and agree. Following this suggestion, we have re-plotted the data in Figure 2 accordingly. Specifically, we now show that XE991 significantly increased input resistance in both WT and KO mossy cells, and the effect of XE991 on increasing input resistance was markedly larger in KO than WT mossy cells. For other figures, we have plotted data in a similar way to show the comparisons between WT and KO, as well as comparisons within genotype +/- XE991.

      Because of the cell-to-cell variability of the voltage responses, it would be more informative and representative if the average of traces from all cells were shown in Fig.2 D & E.

      We agree with the Reviewer’s point. For clarity of presentation, we presented the cell-to-cell variability of the data as scatter points of input resistance values in the bar graph (Figure 2E), together with the representative traces (Figure 2D). Plotting the average traces from all cells would result in a total of 30 traces for all the WT and KO mice, which is difficult to visually assess clearly.

      On page 7, please clarify the recorded cell type in this sentence: "In ‎contrast, WIN markedly reduced the number of sEPSCs in both WT and KO mice...".

      We thank the Reviewer for pointing out this omission and have clarified it in the revised version.

      In Figures 6 C, F, and I, the title of the Y-axis should be normalized frequency. Please also correct the figure legend accordingly because the current sentence can be also interpreted as the absolute or total number of events that were compared, irrespective of the duration of the recordings.

      We thank the Reviewer for this point and have corrected the revised version accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I highly appreciate this study and found the paper to be very well-written and easy to follow. However, a more extensive discussion of what I summarized under "weakness" would strengthen the paper. This may include a broader discussion of the canopy effect itself and the most relevant literature on its extent in rainforest settings in general and primate foods in particular, as well as more details on the dietary behavior of modern orangutans (stratigraphy of orangutan foods) and how seasonal their diet is. The extreme seasonality in orangutan plant food availability should be discussed. Now there are only 2 sentences in the discussion (lines 304-312) and I find the word "plant' only twice overall, though variation in plant food d18O is what drives variation in orangutan dental d18O values.

      We very much appreciate the support of this reviewer, and their feedback about the clarity of the paper. As noted in the provisional reply to reviewers, we are happy to add additional context about the issue of isotopic enrichment within forest canopies, and have expanded the original paragraph in the discussion devoted to this subject. We made reference to the fact that orangutan diets vary by season and site in the original submission, and have now acknowledged that seasonal diet variation may also contribute to variation in enamel isotope values.

      Also, I'd like to note that there has been only one recent study so far that made some level of an attempt to find a breastfeeding effect in orangutans using fecal isotope data. Tsutaya et al. 2022 (AJBA) report some seasonality in adult orangutan fecal isotope values, which could be relevant here as well. But also they reported some data from 2 to 7-year-old orangutan offspring and did not see any breastfeeding pattern in isotope values here either. Probably not too surprising at this older age, but still worth noting in the context of this study.

      There is a 2019 study that sampled fecal isotopes in 43 mother-infant orangutan pairs and found a different pattern than Tsutaya et al. (2022), although these data have not been published in full (Knott et al. (2019) AJBA 168, S68, 128-129). Given these contradictions, the fact that neither study serially sampled the first two years of life, and caveats to fecal isotope sampling of wild primates reviewed in Bădescu et al. (2023: American Journal of Primatology 2023;e235), introducing these nitrogen isotope studies does not aide in the interpretation of oxygen isotope data during intensive nursing, and thus is beyond the scope of this paper. The seasonality Tsutaya et al. (2022) reported in adult fecal samples was for carbon isotopes rather than nitrogen isotopes, and its relevance to the current study is unclear given that the orangutan plant foods measured did not show seasonal variation in carbon isotopes. As requested above, we have noted orangutans’ dietary seasonality might influence the variation of oxygen isotope values.

      Reviewer #2 (Recommendations For The Authors):

      First, the manuscript offers upfront flashy numbers with respect to the number of samples, but what the reader really needs to know upfront is the number of individuals and the number of teeth per individual. These facts are buried and make the reader work too hard to keep track. While the specimen ID numbers are valuable in the table, perhaps a different ID could be used in the text, such as individuals modern Borneo A and B, fossil Sumatra A and B, etc.? Similarly, it would be helpful to remind readers of each locality - Borneo or Sumatra, modern or fossil.

      Tables 1 and 2 and the first sentence of the results and the materials and methods stated that we measured 18 teeth in this study. It is likely that the placement of the tables at the very end of the manuscript in the submitted version made the sample sizes and specimen information less evident to the reviewer. In response to this critique we have now added the number of teeth to the abstract, and trust that when the tables are placed within the text as indicated it will be easier to follow textual references to particular individuals. Museum identification codes have been provided in two previous publications of these teeth, and we retain them here for consistency.

      Second, the manuscript mentions some climate change in Sumatra, but what about Borneo?

      The results on the Bornean fossil teeth stated: “The range of values from these two fossil molars (14.2–24.8 ‰) markedly exceeds the range of modern Bornean orangutans (12.7–20.0 ‰) (Figure 4), with the mean δ18O value at least 2‰ heavier, suggesting possibly drier conditions with greater seasonality during their formation.” In the final section of the discussion, we devoted two paragraphs to discussing evidence for climate change at Niah Cave in Borneo - more than we devote to discussing such data from Sumatra.

      The most valuable figure in the manuscript is Figure 3 showing the serial sampling of modern teeth. It would be incredibly useful to see a similar graph for the fossils and a graph of the modern and fossils together for each island. The violin plots demonstrate a range of values but fail to provide the important seasonality signals. The manuscript is promising but as written is difficult to follow, and the results and conclusions with regard to climate change need more demonstration. On a minor note, I found myself wanting to know about the dates of fossils before knowing the isotopic values. You might wish to move the dating section to precede the isotopes.

      As requested, we have added an additional Supplemental figure making the comparisons of seasonality between fossil and modern individual more evident.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addressed an alternative hypothesis to temporal binding phenomena. In temporal binding, two events that are separated in time are "pulled" towards one another, such that they appear more coincidental. Previous research has shown evidence of temporal binding events in the context of actions and multisensory events. In this context, the author revisits the well-known Libet clock paradigm, in which subjects view a moving clock face, press a button at a time of their choosing to stop the clock, a tone is played (after some delay), and then subjects move the clock dial to the point where the one occurred (or when the action occurred). Classically, the reported clock time is a combination of the action and sound times. The author here suggests that attention can explain this by a mechanism in which the clock dial leads to a roving window of spatiotemporal attention (that is, it extends in both space and time around the dial). To test this, the author conducted a number of experiments where subjects performed the Libet clock experiment, but with a variety of different stimulus combinations. Crucially, a visual detection task was introduced by flashing a disc at different positions along the clock face. The results showed that detection performance was also "pulled" towards the action event or sensory event, depending on the condition. A model of roving spatiotemporal attention replicated these effects, providing further evidence of the attentional window.

      Strengths:

      The study provides a novel explanation for temporal binding phenomena, with clear and cleverly designed experiments. The results provide a nice fit to the proposed model, and the model itself is able to recapitulate the observed effects.

      Weaknesses:

      Despite the above, the paper could be clearer on why these effects are occurring. In particular, the control experiment introduced in Experiment 3 is not well justified. Why should a tactile stimulus not lead to a similar effect? There are possibilities here, but the author could do well to lay them out. Further, from a perspective related to the attentional explanation, other alternatives are not explored. The author cites and considers work suggesting that temporal binding relies on a Bayesian cue combination mechanism, in which the estimate is pulled towards the stimulus with the lowest variance, but this is not discussed. None of this necessarily detracts from the findings, but otherwise makes the case for attention less clear.

      I would like to thank the reviewer for the helpful comments and recommendations. Regarding Experiment 3, the rationale is this. We showed in Experiments 1 and 2 that, for outcome binding, there were two types of difference between Action Sound condition and Sound Only condition: the reported time of sound onset (i.e. the reported clock hand location at the sound onset) and the attention distribution. To experimentally test the relevance of the attention difference to the difference of reported time, we created a situation where the attention difference could be minimised and then checked the difference of reported time. We found that when the attention difference was controlled for between the two conditions, the difference of reported time was also gone, thus providing further evidence for a close link between attention and time report in the current testing paradigm. Therefore, Experiment 3 was primarily targeting the experimental evidence for the claim of the current study. What we needed in Experiment 3 was a condition that could have a smaller attention difference with the Action Sound condition than the attention difference between Sound Only and Action Sound conditions in Experiments 1 and 2. We expected that a tactile stimulus before the sound onset could work, without a clear prediction of the strength of the tactile stimulus in shifting attention, which was also not necessary. This experimental manipulation was a nice fit for the purpose of experiment 3, as we could empirically measur the effectiveness of the tactile stimulus on attention shift and then relate it to the changes in outcome binding.

      As the reviewer correctly suggested, the Bayesian framework has been applied in several studies to explain the time judgement distortion in sensorimotor situations (e.g. the temporal binding effect studied here). However, the current study asked what temporal binding is really about when it is measured with the Libet clock method. Is it really about a distortion in time perception (which the Bayesian account tries to explain)? Or is it also about attention? The results showed that the spatiotemporal attention distribution is at least a confound in measuring the perceived time of an event using the Libet clock method. Therefore, the Bayesian account raised in previous studies is relevant when explaining the distortion in time perception, given that it really exists. We here asked if the distortion really exists, and to what extent.

      Reviewer #2 (Public Review):

      Summary:

      Temporal binding, generally considered a timing illusion, results from actions triggering outcomes after a brief delay, distorting perceived timing. The present study investigates the relationship between attention and the perception of timing by employing a series of tasks involving auditory and visual stimuli. The results highlight the role of attention in event timing and the functional relevance of attention in outcome binding.

      Strengths:

      • Experimental Design: The manuscript details a well-structured sequence of experiments investigating the attention effect in outcome binding. Thoughtful variations in manipulation conditions and stimuli contribute to a thorough and meaningful investigation of the phenomenon.

      • Statistical Analysis: The manuscript employs a diverse set of statistical tests, demonstrating careful selection and execution. This statistical approach enhances the reliability of the reported findings.

      • Narrative Clarity: Both in-text descriptions and figures provide clear insights into the experiments and their results, facilitating readers in following the logic of the study.

      Weaknesses:

      • Conceptual Clarity: The manuscript aims to integrate key concepts in human cognitive functions, including attention, timing perception, and sensorimotor processes. However, before introducing experiments, there's a need for clearer definitions and explanations of these concepts and their known and unknown interrelationships. Given the complexity of attention, a more detailed discussion, including specific types and properties, would enhance reader comprehension.

      • Computational Modeling: The manuscript lacks clarity in explaining the model architecture and setup, and it's unclear if control comparisons were conducted. These details are critical for readers to properly interpret attention-related findings in the modeling section. Providing a clearer overview of these aspects will improve the overall understanding of the computational models used.

      I would like to thank the reviewer for the helpful comments and recommendations. The attention in the current study, which has been made clearer in the revised manuscript, refers specifically to visuospatial attention. It is presented as a key factor shaping the results of timing report obtained with the clock method, thereby contributing to the explanation of temporal binding. Indeed, attention has been mentioned previously in a similar context, but was treated vaguely as a kind of general cognitive resources. The current study specifically tested and verified that the visuospatial attention paid to the clock face influenced the timing reports. This point has been discussed in a dedicated paragraph in the discussion section of the revised manuscript.

      The modelling of the timing report using the attention data was based on a very simple idea: The clock hand location receiving more attention should be given more weight when participants made the timing report (i.e. reporting the clock hand position). The weight for each location was calculated using the detection rate at each location. The relevant methods section has been extensively revised to provide a step-by-step implementation of the modelling, with rationales and pitfalls in the interpretation of the modelling results given (also in the discussion section).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the immunophenotypes of cancer treatment-related pneumonitis. The evidence supporting the claims of the authors is solid, although the inclusion of controls, as suggested by one of the reviewers, strengthened the study. The work will be of interest to cancer immunologists.

      Response: We are thankful for the editor's recognition of the contribution our study makes to understanding the immunophenotypes associated with cancer treatment-related pneumonitis. We agree that the inclusion of control data is pivotal for benchmarking biomarkers. While our initial study design was constrained by the availability of BALF from healthy individuals within clinical settings, we addressed this limitation by incorporating scRNA-seq data from healthy control and COVID-19 BALF cells sourced from the GSE145926 dataset. This additional analysis has provided a baseline for comparison, revealing that CD16 is expressed in a minority of T cells in healthy BALF, specifically 1.0% of CD4+ T cells and 1.6% of CD8+ T cells. The inclusion of this data as Figures 6H and 6I in our manuscript offers a robust context for the significant increase in CD16-expressing T cells observed in patients with PCP, thus enhancing the robustness of our study's conclusions.

      Author response image 1.

      Reviewer #1 (Recommendations For The Authors):

      Many thanks for giving me the opportunity to review your paper. I really enjoyed the way you carried out this work - for example, your use of a wide panel of markers and the use of two analytical methods - you have clearly given great thought to bias avoidance. I also greatly appreciated your paragraph on the limitations, as there are several, but you do not 'over-sell' your conclusions so there is no issue here for me.

      To improve the piece, there are a few typos (eg 318 - specific to alpha-myosin) and I was briefly confused about the highlighted clusters in Figure 4. Perhaps mention why they are highlighted when they first appear in 4D instead of E?

      Response: We have corrected the typos, and we have rearranged the sequence of Figures 3E and 3F, as well as 4D and 4E, to ensure a logical flow. Citrus-generated violin plots are now presented prior to the heatmap of the clusters, which better illustrates the progression of our analysis and the derivation of the clusters.

      In terms of improvements to the data, obviously it would have been ideal if you had had some sort of healthy control as a point of reference for all cohorts, but working in the field I understand the difficulties in getting healthy BAL. It would be worth your while however trying to find more supportive data in the literature in general. There are studies which assess various immune markers in healthy BAL eg https://journal-inflammation.biomedcentral.com/articles/10.1186/1476-9255-11-9. and so I think it is worth looking wrt the main findings. For example, are CD16+ T cells seen in healthy BAL or any other conditions (at present the COVID study is being over-relied on)? Could these cells be gamma deltas? (gamma deltas frequently express CD8 and CD16, and can switch to APC like phenotypes).

      Response: We are grateful for the reviewer's consideration of the practical challenges associated with collecting BALF from healthy individuals. Alternatively, we have supplemented our analysis with single-cell RNA sequencing data from BALF cells of healthy controls, as found in existing literature (Nature Medicine 2020; 26: 842-844). We have accessed to GSE145926 and downloaded data of BALF cells from healthy control (n=3) and severe COVID19 (n=6). The filtered gene-barcode matrix was first normalized using ‘NormalizeData’ methods in Seurat v.4 with default parameters. The top 2,000 variable genes were then identified using the ‘vst’ method in Seurat FindVariableFeatures function. Then PCA and UMAP was performed. T cells were identified as CD2 >1 and CD3E >1, and FCGR3A expression was explored using an expression threshold of 0.5. Violin plots and bar plots were generated by ggplot function.

      Regarding the pivotal finding of increased CD16-expressing T cells in patients with PCP, the scRNA-seq data mining indicates that CD16 is expressed by a minority of T cells in healthy BALF—1.0% of CD4+ T cells and 1.6% of CD8+ T cells. These figures, now incorporated into our revised manuscript as Figures 6H and 6I, substantiate our findings. These cells could be gamma delta T cells, but we could not confirm it with the limited data. We will investigate in the future study. The main text has been updated to reflect these findings.

      Author response image 2.

      I would agree with your approach of not going down the transcript route, so just focus on protein expression.

      I think you need to mention more about the impact of ICI on PD1 expression - in the methods you lose one approach owing to low T cell expression (132) but in the discussion you mention ICI induced high expression (311) as previously reported. This apparent contradiction needs an explanation.

      Response: We acknowledge the need for clarification regarding the impact of ICIs on PD-1 expression. In the methods section, the low detection of PD-1 expression on T cells in patients treated with nivolumab was indeed noted; this was due to the competitive nature of the PD-1 detection antibody EH12.2 with nivolumab. As reported by Suzuki et al. (International Immunology 2020; 32: 547-557), T cells from patients with ICI-induced ILD, including those treated with nivolumab, exhibit upregulated PD-1 expression, where the PD-1 detection antibody (clone: MIH4). Conversely, as outlined by Yanagihara et al. (BBRC 2020; 527: 213-217), the PD-1 detection antibody clone EH12.2 conjugated with 155Gd (#3155009B) used in our study is unable to detect PD-1 when patients are under nivolumab treatment due to competitive inhibition. The absence of a metal-conjugated PD-1 antibody with the MIH4 clone presented a limitation in our study. Ideally, we would have conjugated the MIH4 antibody with 155Gd for our analysis, which is a refinement we aim to incorporate in future research. We have now included this discussion in our manuscript to clarify the contradiction between the methodological limitations and the high PD-1 expression induced by ICIs, as reported in the literature. This addition will guide readers through the nuances of antibody selection and its implications for detecting PD-1 expression in the context of ICI treatment.

      Finally, since you have the severity data, it would be good to assess all the significantly different clusters against this metric, as you have done for CD16+ T cells. Not only may this reveal more wrt the impact of other immune populations, but it'll also give a point of reference for the CD16+ T cell data.

      Response: Thank you for the suggestion to assess all significantly different clusters against the disease severity metric. We have expanded our analysis to include a thorough correlation study between the disease severity and intensity of various T-cell markers. Notably, we observed that intensity of CCR7 expression correlates with the disease severity. Although the precise biological significance of this correlation remains to be elucidated, it may suggest a role for CCR7+ T cells in the pathogenesis or progression of the disease. We have considered the potential implications of this finding and included it as Supplementary Figure 5. We have also discussed this observation in the discussion section.

      Author response image 3.

      Overall though I think this is a really nice study, with a potentially very significant finding in linking CD16+ T cells with severity. Congratulations.

      Response: We would like to thank the reviewer’s heartful comments on our manuscript.

      Reviewer #2 (Recommendations For The Authors):

      General:

      1) The fact that this is a retrospective study should be indicated earlier in the paper.

      Response: Now we have mentioned the retrospective nature of the study in the method section as follows: In this retrospective study, patients who were newly diagnosed with PCP, DI-ILD, and ICI-ILD and had undergone BALF collection at Kyushu University Hospital from January 2017 to April 2022 were included. The retrospective study was approved by the Ethics Committee of Kyushu University Hospital (reference number 22117-00).

      2) tSNE and UMAP are dimensionality reduction techniques that don't cluster the cells, the authors should specify what clustering algorithm was used subsequently (e.g FlowSOM)

      Response: The cluster was determined manually by their expression pattern.

      3) With regards to the role of CD16 in a potential exacerbated cytotoxicity in the fatal PCP case, the authors could measure the levels of C3a related proteins in patient serum to link to a common immunopathogenic pathway with COVID.

      Response: We did not collect serum from the patients in this study as our research protocol was approved by the Ethics committee for the use of BALF only. However, we agree with your assessment that the measurement of serum C3a levels would be informative. In future studies, we will incorporate the measurement of serum C3a levels to provide more comprehensive insights into the impact of C3a on immune function. Thank you for your valuable feedback and for helping us to improve the quality of our research.

      Line-specific:

      101 The authors should provide some information on how the cryopreservation of the BALF was carried out.

      Response: Upon collection, BALF samples were immediately centrifuged at 300 g for 5 minutes to pellet the cells. The resultant cell pellets were then resuspended in Cellbanker 1 cryopreservation solution (Takara, catalog #210409). This suspension was aliquoted into cryovials and gradually frozen to –80ºC using a controlled rate freezing method to ensure cell viability. The samples were stored at –80ºC until required for experimental analysis. We have added the information in the method section.

      Fig 3B: It would be very helpful if the authors could add a supplementary figure with marker expression on the UMAP projection.

      Response: We have added Supplementary Figure 4 with marker expression on the UMAP projection in Figure 3B.

      Fig 4A: Same as Fig 3B

      Response: We have added Supplementary Figure 5 with marker expression on the UMAP projection in Figure 4A.

      Fig 5B: Same as Fig 3B

      Response: We have added Supplementary Figure 6 with marker expression on the tSNE projection in Figure 5B.

      266 Authors should state if the data is not shown with regards to differences in myeloid cell fractions

      430 Marker intensity is not shown in panel D

      Re: Corrected as follows: “Citrus network tree visualizing the hierarchical relationship of each marker between identified T cell ~”

      446 The legend says patients have IPF, CTD-ILD, sarcoidosis but the figure shows PCP, DI-ILD, ICI-ILD.

      Re: Corrected.

      451 What do the authors mean in "Graphical plots represent individual samples"? Panel B is a dot plot of all samples.

      Response: Corrected as “Dot plots represent ~”.

      472 What do the authors mean in "Graphical plots represent individual samples"? Panel C is a dot plot of all samples.

      Response: Corrected as “Dot plots represent ~”.

      Reviewer #3 (Recommendations For The Authors):

      An important thing is to add comparisons against healthy donors, at least. A common baseline is needed to firmly establish any biomarkers.

      Response: We acknowledge the reviewer's concern regarding the comparison with healthy donors. Although our study did not initially include BALF collection from healthy controls due to the constraints of clinical practice, we recognize the importance of a control baseline to validate biomarkers. To address this, we have integrated scRNA-seq data from healthy control BALF cells available in public datasets (Nature Medicine 2020; 26: 842-844), accessed from GSE145926. This dataset includes BALF cells from healthy controls (n=3) alongside severe COVID-19 patients (n=6). Data mining confirmed that CD16 expression is in a minority of T cells in healthy BALF—1.0% of CD4+ T cells and 1.6% of CD8+ T cells. We have included this comparative data in our manuscript as Figures 6H and 6I to provide context for the observed increase in CD16-expressing T cells in PCP patients, which substantiates our findings.

      Author response image 4.

      Data analysis needs to go deeper. There are several other tools on Cytobank alone that would allow a more quantitative analysis of the data. Fold changes in marker expressions would be very important as measurements of phenotypic changes.

      Response: We thank the reviewer for their constructive feedback on the depth of our data analysis. We acknowledge the value of a more quantitative approach, including the use of fold change measurements to assess phenotypic alterations, and recognize the potential insights such tools on Cytobank could provide. Due to the scope and limited space of the current study, we have focused our analysis on the most pertinent findings relevant to our research questions. We believe the present analysis serves the immediate objectives of this study. However, we agree that further quantitative analysis would enhance the understanding of the data. We have expanded our analysis to include a thorough correlation study between the disease severity of PCP and intensity of various T-cell markers. Notably, we observed that intensity of CCR7 expression correlates with the disease severity of PCP. Although the precise biological significance of this correlation remains to be elucidated, it may suggest a role for CCR7+ T cells in the pathogenesis or progression of the disease. We have considered the potential implications of this finding and included it as Supplementary Figure 5. We have also discussed this observation in the discussion section. We aim to consider these approaches in future work to build upon the foundation laid by this study. Your suggestions are invaluable and will be kept at the forefront as we plan subsequent research phases.

      Author response image 5.

      Reviewer #1 (Public Review):

      Cytotoxic agents and immune checkpoint inhibitors are the most commonly used and efficacious treatments for lung cancers. However their use brings two significant pulmonary side-effects; namely Pneumocystis jirovecii infection and resultant pneumonia (PCP), and interstitial lung disease (ILD). To observe the potential immunological drivers of these adverse events, Yanagihara et al. analysed and compared cells present in the bronchoalveolar lavage of three patient groups (PCP, cytotoxic drug-induced ILD [DI-ILD], and ICI-associated ILD [ICI-ILD]) using mass cytometry (64 markers). In PCP, they observed an expansion of the CD16+ T cell population, with the highest CD16+ T proportion (97.5%) in a fatal case, whilst in ICI-ILD, they found an increase in CD57+ CD8+ T cells expressing immune checkpoints (TIGIT+ LAG3+ TIM-3+ PD-1+), FCRL5+ B cells, and CCR2+ CCR5+ CD14+ monocytes. Given the fatal case, the authors also assessed for, and found, a correlation between CD16+ T cells and disease severity in PCP, postulating that this may be owing to endothelial destruction. Although n numbers are relatively small (n=7-9 in each cohort; common numbers for CyTOF papers), the authors use a wide panel (n=65) and two clustering methodologies giving greater strength to the conclusions. The differential populations discovered using one or two of the analytical methods are robust: whole population shifts with clear and significant clustering. These data are an excellent resource for clinical disease specialists and pan-disease immunologists, with a broad and engaging contextual discussion about what they could mean.

      Strengths:

      • The differences in immune cells in BAL in these specific patient subgroups is relatively unexplored.

      • This is an observational study, with no starting hypothesis being tested.

      • Two analytical methods are used to cluster the data.

      • A relatively wide panel was used (64 markers), with particular strength in the alpha beta T cells and B cells.

      • Relevant biomarkers, beta-D-glucan and KL-6 were also analysed

      • Appropriate statistics were used throughout.

      • Numbers are low (7 cases of PCP, 9 of DI-ILD, and 9 of ICI-ILD) but these are difficult samples to collect and so in relative terms, and considering the use of CyTOF, these are good numbers.

      • Beta-D-glucan shows potential as a biomarker for PCP (as previously reported) whilst KL-6 shows potential as a biomarker for ICI-ILD (not reported before). Interestingly, KL-6 was not seen to be increased in DI-ILD patients.

      • Despite the relatively low n numbers and lack of matching there are some clear differentials. The CD4/CD8+CD16+HLA-DR+CXCR3+CD14- T cell result is striking - up in PCP (with EM CD4s significantly down) - whilst the CD8 EMRA population is clear in ICI-ILD and 'non-exhausted' CD4s, with lower numbers of EMRA CD8s in DI-ILD.

      • The authors identify 17/31 significantly differentiated clusters of myeloid cells, eg CD11bhi CD11chi CD64+ CD206+ alveolar macrophages with HLA-DRhi in PCP.

      • With respect to B cells, the authors found that FCRL5+ B cells were more abundant in patients with ICI-ILD compared to those with PCP and DI-ILD, suggesting these FCRL5+ B cells may have a role in irAE.

      • One patient's extreme CD16+ T cell (97.5% positive) and death, led the authors to consider CD16+ T cells as an indicator of disease severity in PCP. This was then tested and found to be correct.

      • Authors discuss results in context of literature leading them to suggest that CD16+ T cells may target endothelial cells and wonder if anti-complement therapy may be efficacious in PCP.

      • Great discussion on auto-reactive T cell clones where the authors suggest that in ICI-ILD CD8s may react against healthy lung, driving ILD.

      • An observation of CXCR3 in different CD8 populations in ICI-ILD and PCP lead the authors to hypothesise on the chemoattractants in the microenvironment.

      • Excellent point suggesting CD57 may not always be a marker of senescence on T cells - reflective of growing change within the community.

      • Well considered suggestion that FCRL5+ B cells may be involved in ICI-ILD driven autoimmunity.

      • The authors discuss the main weaknesses in the discussion and stress that the findings detailed in the paper "demonstrate a correlation rather than proof of causation".

      • Figures and legends are clear and pleasing to the eye.

      Weaknesses:

      • This is an observational study, with no starting hypothesis being tested.

      • Only patients who were able to have a lavage taken have been recruited.

      • One set of analysis wasn't carried out for one subgroup (ICI-ILD) as PD1 expression was negative owing to the use of nivolumab.

      • Some immune cell subsets wouldn't be picked up with the markers and gating strategies used; e.g. NK cells.

      • Some immune cells would be disproportionately damaged by the storage, thawing and preparation of the samples; e.g. granulocytes.

      • Numbers are low (7 cases of PCP, 9 of DI-ILD, and 9 of ICI-ILD), sex, age and adverse event matching wasn't performed, and treatment regimen are varied and 'suspected' (suggesting incomplete clinical data) - but these are difficult samples to collect. These numbers drop further for some analyses e.g. T cell clustering owing to factors such as low cell number.

      • The disease comparisons are with each other, there is no healthy control.

      • Samples are taken at one time point.

      • The discussion on probably the stand out result - the CD16+ T cells in PCP - relies on two papers - leading to a slightly skewed emphasis on one paper on CD16+ cells in COVID. There are other papers out there that have observed CD16+ T cells in other conditions. It is also worth being in mind that given the markers used, these CD16+ T cell may be gamma deltas.

      • The discussion on ICI patient consistently showing increased PD1, could have been greater, as given the ICI is targeting PD1, one would expect the opposite as commented on, and observed, in the methods section.

      Reviewer #2 (Public Review):

      Yanagihara and colleagues investigated the immune cell composition of bronchoalveolar lavage fluid (BALF) samples in a cohort of patients with malignancy undergoing chemotherapy and with with lung adverse reactions including Pneumocystis jirovecii pneumonia (PCP) and immune-checkpoint inhibitors (ICIs) or cytotoxic drug induced interstitial lung diseases (ILDs). Using mass cytometry, their aim was to characterize the cellular and molecular changes in BAL to improve our understanding of their pathogenesis and identify potential biomarkers and therapeutic targets. In this regard, the authors identify a correlation between CD16 expression in T cells and the severity of PCP and an increased infiltration of CD57+ CD8+ T cells expressing immune checkpoints and FCLR5+ B cells in ICI-ILD patients.

      The conclusions of this paper are mostly well supported by data, but some aspects of the data analysis need to be clarified and extended.

      1) The authors should elaborate on why different set of markers were selected for each analysis step. E.g., Different set of markers were used for UMAP, CITRUS and viSNE in the T cell and myeloid analysis.

      2) The authors should state if a normality test for the distribution of the data was performed. If not, non-parametric tests should be used.

      3) The authors should explore the correlation between CD16 intensity and the CTCAE grade in T cell subsets such as EMRA CD8 T cells, effector memory CD4, etc as identified in Figure 1B.

      4) The authors could use CITRUS to better assess the B cell compartment.

      Reviewer #3 (Public Review):

      The authors collected BALF samples from lung cancer patients newly diagnosed with PCP, DI-ILD or ICI-ILD. CyTOF was performed on these samples, using two different panels (T-cell and B-cell/myeloid cell panels). Results were collected, cleaned-up, manually gated and pre-processed prior to visualisation with manifold learning approaches t-SNE (in the form of viSNE) or UMAP, and analysed by CITRUS (hierarchical clustering followed by feature selection and regression) for population identification - all using Cytobank implementation - in an attempt to identify possible biomarkers for these disease states. By comparing cell abundances from CITRUS results and qualitative inspection of a small number of marker expressions, the authors claimed to have identified an expansion of CD16+ T-cell population in PCP cases and an increase in CD57+ CD8+ T-cells, FCRL5+ B-cells and CCR2+ CCR5+ CD14+ monocytes in ICI-ILD cases.

      By the authors' own admission, there is an absence of healthy donor samples and, perhaps as a result of retrospective experimental design, also an absence of pre-treatment samples. The entire analysis effectively compares three yet-established disease states with no common baseline - what really constitutes a "biomarker" in such cases? The introduction asserts that "y characterizing the cellular and molecular changes in BAL from patients with these complications, we aim to improve our understanding of their pathogenesis and identify potential therapeutic targets" (lines 82-84). Given these obvious omissions, no real "changes" have been studied in the paper. These are very limited comparisons among three, and only these three, states.

      Even assuming more thorough experimental design, the data analysis is unfortunately too shallow and has not managed to explore the wealth of information that could potentially be extracted from the results. CITRUS is accessible and convenient, but also make a couple of big assumptions which could affect data analysis - 1) Is it justified to concatenate all FCS files to analyse the data in one batch / small batches? Could there be batch effects or otherwise other biological events that could confuse the algorithm? 2) With a relatively small number of samples, and after internal feature selection of CITRUS, is the regression model suitable for population identification or would it be too crude and miss out rare populations? There are plenty of other established methods that could be used instead. Have those methods been considered?

      Colouring t-SNE or UMAP (e.g. Figure 6C) plots by marker expression is useful for quick identification of cell populations but it is not a quantitative analysis. In a CyTOF analysis like this, it is common to work out fold changes of marker expressions between conditions. It is inadequate to judge expression levels and infer differences simply by looking at colours.

      The relatively small number of samples also mean that most results presented in the paper are not statistical significant. Whilst it is understandable that it is not always possible to collect a large number of patient samples for studies like this, having several entire major figures showing "n.s." (e.g. Figures 3A, 4B and 5C), together with limitations in the comparisons themselves and inadequate analysis, make the observations difficult to be convincing, and even less so for the single fatal PCP case where N = 1.

      It would also be good scientific practice to show evidence of sample data quality control. Were individual FCS files examined? Did the staining work? Some indication of QC would also be great.

      This dataset generated and studied by the authors have the potential to address the question they set out to answer and thus potentially be useful for the field. However, in the current state of presentation, more evidence and more thorough data analysis are needed to draw any conclusions, or correlations, as the authors would like to frame them.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      This paper performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument. The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate. Strengths: The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      Response: Thank you very much for your affirmation of our work. The reviewer discussed the parts of our manuscript that involve evolution sentence by sentence. We have further refined the description in this regard and improved the logical flow. Thank you again for your help.

      Weaknesses:

      1) The last section of the results, entitled "Downstream target gene analysis" is primarily based on in silico genome-wide binding motif predictions.

      While the authors identify a potential binding site using EMSA, it is unclear how much this general approach over-predicted potential targets. While I think this work is interesting, its potential caveats are not mentioned. In fact the Discussion section seems to trust the high number of target genes as a reliable result. Specifically, the authors correctly say: "even if there are some transcription factor-binding sites in a gene, the gene is not necessarily regulated by these factors in a specific tissue and period", but then propose a biological explanation that not all binding sites are relevant to expression control. This makes a radical short-cut that predicted binding sites are actual in vivo binding sites. This may not be true, as I'd expect that only a subset of binding motifs predicted by Positional Weight Matrices (PWM) are real in vivo binding sites with a ChIP-seq or Cut-and-Run signal. This is particularly problematic for PWM that feature only 5-nt signature motifs, as inferred here for mamo-S and mamo-L, simply because we can expect many predicted sites by chance.

      Response: Thank you very much for your careful work. The analysis and identification of transcription factor-binding sites is an important issue in gene regulation research. Techniques such as ChIP-seq can be used to experimentally identify the binding sites of transcription factors (TFs). However, reports using these techniques often only detect specific cell types and developmental stages, resulting in a limited number of downstream target genes for some TFs. Interestingly, TFs may regulate different downstream target genes in different cell types and developmental stages.

      Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions -1, 2, 3, and 6 are key amino acids for recognizing and binding DNA. The residues at positions -1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand (Wolfe SA et al., 2000; Pabo CO et al., 2001). Based on this principle, the binding sites of C2H2-ZF have good reference value. For the 5-nt PWM sequence, we referred to the study of D. melanogaster, which was identified by EMSA (Shoichi Nakamura et al., 2019). In the new version, we have rewritten this section.

      Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313-340.

      Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183-212.

      Nakamura S, Hira S, Fujiwara M, et al. A truncated form of a transcription factor Mamo activates vasa in Drosophila embryos. Commun Biol. 2019;2:422. Published 2019 Nov 20.

      2) The last part of the current discussion ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program") is flawed with important logical shortcuts that assign "agency" to the evolutionary process. For instance, this section conveys the idea that phenotypically relevant mutations may not be random. I believe some of this is due to translation issues in English, as I understand that the authors want to express the idea that some parts of the genome are paths of least resistance for evolutionary change (e.g. the regulatory regions of developmental regulators are likely to articulate morphological change). But the language and tone is made worst by the mention that in another system, a mechanism involving photoreception drives adaptive plasticity, making it sound like the authors want to make a Lamarckian argument here (inheritance of acquired characteristics), or a point about orthogenesis (e.g. the idea that the environment may guide non-random mutations).

      Because this last part of the current discussion suffers from confused statements on modes and tempo of regulatory evolution and is rather out of topic, I would suggest removing it.

      In any case, it is important to highlight here that while this manuscript is an excellent genotype-to-phenotype study, it has very few comparative insights on the evolutionary process. The finding that mamo is a pattern or pigment regulatory factor is interesting and will deserve many more studies to decipher the full evolutionary study behind this Gene Regulatory Network.

      Response: Thank you very much for your careful work. In this part of the manuscript, we introduced some assumptions that make the statement slightly unconventional. The color pattern of insects is an adaptive trait. The bd and bdf mutants used in the study are formed spontaneously. As a frequent variation and readily observable phenotype, color patterns have been used as models for evolutionary research (Wittkopp PJ et al., 2011). Darwin's theory of natural selection has epoch-making significance. I deeply believe in the theory that species strive to evolve through natural selection. However, with the development of molecular genetics, Darwinism’s theory of undirected random mutations and slow accumulation of micromutations resulting in phenotype evolution has been increasingly challenged.

      The prerequisite for undirected random mutations and micromutations is excessive reproduction to generate a sufficiently large population. A sufficiently large population can contain sufficient genotypes to face various survival challenges. However, it is difficult to explain how some small groups and species with relatively low fertility rates have survived thus far. More importantly, the theory cannot explain the currently observed genomic mutation bias. In scientific research, every theory is constantly being modified to adapt to current discoveries. The most famous example is the debate over whether light is a particle or a wave, which has lasted for hundreds of years. However, in the 20th century, both sides seemed to compromise with each other, believing that light has a wave‒particle duality.

      In summary, we have rewritten this section to reduce unnecessary assumptions.

      Wittkopp PJ, Kalay G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet. 2011;13(1):59-69.

      Minor Comment:

      The gene models presented in Figure 1 are obsolete, as there are more recent annotations of the Bm-mamo gene that feature more complete intron-exon structures, including for the neighboring genes in the bd/bdf intervals. It remains true that the mamo locus encodes two protein isoforms.

      An example of the Bm-mamo locus annotation, can be found at: https://www.ncbi.nlm.nih.gov/gene/101738295 RNAseq expression tracks (including from larval epidermis) can be displayed in the embedded genome browser from the link above using the "Configure Tracks" tool.

      Based on these more recent annotations, I would say that most of the work on the two isoforms remains valid, but FigS2, and particularly Fig.S2C, need to be revised.

      Response: Thank you very much for your careful work. In this study, we referred to the predicted genes of SilkDB, NCBI and Silkbase. In different databases, there are varying degrees of differences in the number of predicted genes and the length of gene mRNA. Because the SilkDB database is based on the first silkworm genome, it has been used for the longest time and has a relatively large number of users. In the revised manuscript, we have added the predicted genes of NCBI and Silkbase in Figure S1.

      Author response image 1.

      The predicted genes and qPCR analysis of candidate genes in the responsible genomic region for bd mutant. (A) The predicted genes in SilkDB;(B) the predicted genes in Genbak;(C) the predicted genes in Silkbase;(D) analysis of nucleotide differences in the responsible region of bd;(E) investigation of the expression level of candidate genes.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to identify new genes involved in melanin metabolism and its spatial distribution in the silkworm Bombyx mori. They identified the gene Bm-mamo as playing a role in caterpillar pigmentation. By functional genetic and in silico approaches, they identified putative target genes of the Bm-mamo protein. They showed that numerous cuticular proteins are regulated by Bm-mamo during larval development.

      Strengths:

      • preliminary data about the role of cuticular proteins to pattern the localization of pigments

      • timely question

      • challenging question because it requires the development of future genetic and cell biology tools at the nanoscale

      Response: Thank you very much for your affirmation of our work. The reviewer's familiarity with the color patterns of Lepidoptera is helpful, and the recommendation raised has provided us with very important assistance. This has allowed us to make significant progress with our manuscript.

      Weaknesses:

      • statistical sampling limited

      • the discussion would gain in being shorter and refocused on a few points, especially the link between cuticular proteins and pigmentation. The article would be better if the last evolutionary-themed section of the discussion is removed.

      A recent paper has been published on the same gene in Bombyx mori (https://www.sciencedirect.com/science/article/abs/pii/S0965174823000760) in August 2023. The authors must discuss and refer to this published paper through the present manuscript.

      Response: Thank you very much for your careful work. First, we believe that competitive research is sometimes coincidental and sometimes intentional. Our research began in 2009, when we began to configure the recombinant population. In 2016, we published an article on comparative transcriptomics (Wu et al. 2016). The article mentioned above has a strong interest in our research and is based on our transcriptome analysis for further research, with the aim of making a preemptive publication. To discourage such behavior, we cannot cite it and do not want to discuss it in our paper.

      Songyuan Wu et al. Comparative analysis of the integument transcriptomes of the black dilute mutant and the wild-type silkworm Bombyx mori. Sci Rep. 2016 May 19:6:26114. doi: 10.1038/srep26114.

      Reviewer #1 (Recommendations For The Authors):

      1) please consider using a more recent annotation model of the B. mori genome to revise your Result Section 1, Fig.1, and Fig. S2. https://www.ncbi.nlm.nih.gov/gene/101738295

      Specifically, you used BGIM_ gene models, while the current annotation such as the one above featured in the NCBI database provides more accurate intron-exon structures without splitting mamo into tow genes. I believe this can be done with minor revisions of the figures, and you could keep the BGIM_ gene names for the text.

      Response: Thank you very much for your careful work. The GenBank of NCBI (National Center for Biotechnology Information) is a very good database that we often use and refer to in this research process. Our research started in 2009, so we mainly referred to the SilkDB database (Jun Duan et al., 2010), although other databases also have references, such as NCBI and Silkbase (https://silkbase.ab.a.u-tokyo.ac.jp/cgi-bin/index.cgi). Because the SilkDB database was constructed based on the first published silkworm genome data, it has been used for the longest time and has a relatively large number of users. Recently, researchers are still using these data (Kejie Li et al., 2023).

      The problem with predicting the mamo gene as two genes (BGIBMGA012517 and BGIBMGA012518) in SilkDB is mainly due to the presence of alternative splicing of the mamo gene. BGIBMGA012517 corresponds to the shorter transcript (mamo-s) of the mamo gene. Due to the differences in sequencing individuals, sequencing methods, and methods of gene prediction, there are differences in the number and sequence of predicted genes in different databases. We added the pattern diagram of predicted genes from NCBI and Silkbase, and the expression levels of new predicted genes are shown in Supplemental Figure S1.

      Jun Duan et al., SilkDB v2.0: a platform for silkworm (Bombyx mori) genome biology. Nucleic Acids Res. 2010 Jan;38(Database issue): D453-6. doi: 10.1093/nar/gkp801. Kejie Li et al., Transcriptome analysis reveals that knocking out BmNPV iap2 induces apoptosis by inhibiting the oxidative phosphorylation pathway. Int J Biol Macromol. 2023 Apr 1;233:123482. doi: 10.1016/j.ijbiomac.2023.123482. Epub 2023 Jan 31.

      Author response image 2.

      The predicted genes and qPCR analysis of candidate genes in the responsible genomic region for bd mutant. (A) The predicted genes in SilkDB;(B) the predicted genes in Genbak;(C) the predicted genes in Silkbase;(D) analysis of nucleotide differences in the responsible region of bd;(E) investigation of the expression level of candidate genes.

      2) As I mentioned in my public review, I strongly believe the interpretation of the PWM binding analyses require much more conservative statements taking into account the idea that short 5-nt motifs are expected by chance. The work in this section is interesting, but the manuscript would benefit from a quite significant rewrite of the corresponding Discussion section, making it that the in silico approach is prone to the identification of many sites in the genomes, and that very few of those sites are probably relevant for probabilistic reasons. I would recommend statements such as "Future experiments assessing the in vivo binding profile of Bm-mamo (eg. ChIP-seq or Cut&Run), will be required to further understand the GRNs controlled by mamo in various tissues".

      Response: Thank you very much for your careful work. Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions -1, 2, 3, and 6 are key amino acids for recognizing and binding DNA. The residues at positions -1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand (Wolfe SA et al., 2000; Pabo CO et al., 2001). Based on this principle, the prediction of DNA recognition motifs of C2H2-type zinc finger proteins currently has good accuracy.

      The predicted DNA binding sequence (GTGCGTGGC) of the mamo protein in Drosophila melanogaster was highly consistent with that of silkworms. In addition, in D. melanogaster, the predicted DNA binding sequence of mamo, the bases at positions 1 to 7 (GTGCGTG), was highly similar to the DNA binding sequence obtained from EMSA experiments (Seiji Hira et al., 2013). Furthermore, in another study on the mamo protein of Drosophila melanogaster, five bases (TGCGT) were used as the DNA recognition core sequence of the mamo protein (Shoichi Nakamura et al., 2019). In the JASPAR database (https://jaspar.genereg.net), there are also some shorter (4-6 nt) DNA recognition sequences; for example, the DNA binding sequence of Ubx is TAAT (ID MA0094.1) in Drosophila melanogaster. However, we used longer DNA binding motifs (9 nt and 15 nt) of mamo to study the 2 kb genomic regions near the predicted gene. Over 70% of predicted genes were found to have these feature sequences near them. This analysis method is carried out with common software and processes. Due to sufficient target proteins, the accessibility of DNA, the absence of suppressors, the suitability of ion environments, etc., zinc finger protein transcription factors are more likely to bind to specific DNA sequences in vitro than in vivo. Using ChIP-seq or Cut&Run techniques to analyze various tissues and developmental stages in silkworms can yield one comprehensive DNA-binding map of mamo, and some false positives generated by predictions can be excluded. Thank you for your suggestion. We will conduct this work in the next research step. In addition, for brevity, we deleted the predicted data (Supplemental Tables S7 and S8) that used shorter motifs.

      Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313-340.

      Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183-212.

      Anton V Persikov et al., De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 2014 Jan;42(1):97-108. doi: 10.1093/nar/gkt890. Epub 2013 Oct 3.

      Seiji Hira et al., Binding of Drosophila maternal Mamo protein to chromatin and specific DNA sequences. Biochem Biophys Res Commun. 2013 Aug 16;438(1):156-60. doi: 10.1016/j.bbrc.2013.07.045. Epub 2013 Jul 20.

      Shoichi Nakamura et al., A truncated form of a transcription factor Mamo activates vasa in Drosophila embryos. Commun Biol. 2019 Nov 20;2: 422. doi: 10.1038/s42003-019-0663-4. eCollection 2019.

      3) In my opinion, the last section of the Discussion needs to be completely removed ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program"), as it is over-extending the data into evolutionary interpretations without any support. I would suggest instead writing a short paragraph asking whether the pigmentary role of mamo is a Lepidoptera novelty, or if it could have been lost in the fly lineage.

      Below, I tried to comment point-by-point on the main issues I had.

      Wu et al: Notably, the industrial melanism event, in a short period of several decades, resulted in significant changes in the body color of multiple Lepidoptera species(46). Industrial melanism events, such as changes in the body color of pepper moths, are heritable and caused by genomic mutations(47).

      Yes, but the selective episode was brief, and the relevant "carbonaria" mutations may have existed for a long time at low-frequency in the population.

      Response: Thank you very much for your careful work. Moth species often have melanic variants at low frequencies outside industrial regions. Recent molecular work on genetics has revealed that the melanic (carbonaria) allele of the peppered moth had a single origin in Britain. Further research indicated that the mutation event causing industrial melanism of peppered moth (Biston betularia) in the UK is the insertion of a transposon element into the first intron of the cortex gene. Interestingly, statistical inference based on the distribution of recombined carbonaria haplotypes indicates that this transposition event occurred in approximately 1819, a date highly consistent with a detectable frequency being achieved in the mid-1840s (Arjen E Van't Hof, et al., 2016). From molecular research, it is suggested that this single origin melanized mutant (carbonaria) was generated near the industrial development period, rather than the ancient genotype, in the UK. We have rewritten this part of the manuscript.

      Arjen E Van't Hof, et al., The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016 Jun 2;534(7605):102-5. doi: 10.1038/nature17951.

      Wu et al: If relying solely on random mutations in the genome, which have a time unit of millions of years, to explain the evolution of the phenotype is not enough.

      What you imply here is problematic for several reasons.

      First, as you point out later, some large-effect mutations (e.g. transpositions) can happen quickly.

      Second, it's unclear what "the time units of million of years" means here... mutations occur, segregate in populations, and are selected. The speed of this process depends on the context and genetic architectures.

      Third, I think I understand what you mean with "to explain the evolution of the phenotype is not enough", but this would probably need a reformulation and I don't think it's relevant to bring it here. After all, you used loss-of-function mutants to explain the evolution of artificially selected mutants. The evolutionary insights from these mutants are limited. Random mutations at the mamo locus are perfectly sufficient here to explain the bd and bdf phenotypes and larval traits.

      Response: Thank you very much for your careful work. Charles Darwin himself, who argued that “natural selection can act only by taking advantage of slight successive variations; she can never take a leap, but must advance by the shortest and slowest steps” (Darwin, C. R. 1859). This ‘micromutational’ view of adaptation proved extraordinarily influential. However, the accumulation of micromutations is a lengthy process, which requires a very long time to evolve a significant phenotype. This may be only a proportion of the cases. Interestingly, recent molecular biology studies have shown that the evolution of some morphological traits involves a modest number of genetic changes (H Allen Orr. 2005).

      One example is the genetic basis analysis of armor-plate reduction and pelvic reduction of the three-spined stickleback (Gasterosteus aculeatus) in postglacial lakes. Although the marine form of this species has thick armor, the lake population (which was recently derived from the marine form) does not. The repeated independent evolution of lake morphology has resulted in reduced armor plate and pelvic structures, and there is no doubt that these morphological changes are adaptive. Research has shown that pelvic loss in different natural populations of three-spined stickleback fish occurs by regulatory mutations deleting a tissue-specific enhancer (Pel) of the pituitary homeobox transcription factor 1 (Pitx1) gene. The researchers genotyped 13 pelvic-reduced populations of three-spined stickleback from disparate geographic locations. Nine of the 13 pelvic-reduced stickleback populations had sequence deletions of varying lengths, all of which were located at the Pel enhancer. Relying solely on random mutations in the genome cannot lead to such similar mutation forms among different populations. The author suggested that the Pitx1 locus of the stickleback genome may be prone to double-stranded DNA breaks that are subsequently repaired by NHEJ (Yingguang Frank Chan et al., 2010).

      The bd and bdf mutants used in the study are formed spontaneously. Natural mutation is one of the driving forces of evolution. Nevertheless, we have rewritten the content of this section.

      Darwin, C. R. The Origin of Species (J. Murray, London, 1859).

      H Allen Orr. The genetic theory of adaptation: a brief history. Nat Rev Genet. 2005 Feb;6(2):119-27. doi: 10.1038/nrg1523.

      Yingguang Frank Chan et al., Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science. 2010 Jan 15;327(5963):302-5. doi: 10.1126/science.1182213. Epub 2009 Dec 10.

      Wu et al: Interestingly, the larva of peppered moths has multiple visual factors encoded by visual genes, which are conserved in multiple Lepidoptera, in the skin. Even when its compound eyes are covered, it can rely on the skin to feel the color of the environment to change its body color and adapt to the environment(48). Therefore, caterpillars/insects can distinguish the light wave frequency of the background. We suppose that perceptual signals can stimulate the GRN, the GRN guides the expression of some transcription factors and epigenetic factors, and the interaction of epigenetic factors and transcription factors can open or close the chromatin of corresponding downstream genes, which can guide downstream target gene expression.

      This is extremely confusing because you are bringing in a plastic trait here. It's possible there is a connection between the sensory stimulus and the regulation of mamo in peppered moths, but this is a mere hypothesis. Here, by mentioning a plastic trait, this paragraph sounds as if it was making a statement about directed evolution, especially after implying in the previous sentence that (paraphrasing) "random mutations are not enough". To be perfectly honest, the current writing could be misinterpreted and co-opted by defenders of the Intelligent Design doctrine. I believe and trust this is not your intention.

      Response: Thank you very much for your careful work. The plasticity of the body color of peppered moth larvae is very interesting, but we mainly wanted to emphasize that their skin shows the products of visual genes that can sense the color of the environment by perceiving light. Moreover, these genes are conserved in many insects. Human skin can also perceive light by opsins, suggesting that they might initiate light–induced signaling pathways (Haltaufderhyde K et al., 2015). This indicates that the perception of environmental light by the skin of animals and the induction of feedback through signaling pathways is a common phenomenon. For clarity, we have rewritten this section of the manuscript.

      Haltaufderhyde K, Ozdeslik RN, Wicks NL, Najera JA, Oancea E. Opsin expression in human epidermal skin. Photochem Photobiol. 2015;91(1):117-123.

      Wu et al: In addition, during the opening of chromatin, the probability of mutation of exposed genomic DNA sequences will increase (49).

      Here again, this is veering towards a strongly Lamarckian view with the environment guiding specific mutation. I simply cannot see how this would apply to mamo, nothing in the current article indicates this could be the case here. Among many issues with this, it's unclear how chromatin opening in the larval integument may result in heritable mutations in the germline.

      Response: Thank you very much for your careful work. Previous studies have shown that there is a mutation bias in the genome; compared with the intergenic region, the mutation frequency is reduced by half inside gene bodies and by two-thirds in essential genes. In addition, they compared the mutation rates of genes with different functions. The mutation rate in the coding region of essential genes (such as translation) is the lowest, and the mutation rates in the coding region of specialized functional genes (such as environmental response) are the highest. These patterns are mainly affected by the traits of the epigenome (J Grey Monroe et al., 2022).

      In eukaryotes, chromatin is organized as repeating units of nucleosomes, each consisting of a histone octamer and the surrounding DNA. This structure can protect DNA. When one gene is activated, the chromatin region of this gene is locally opened, becoming an accessible region. Research has found that DNA accessibility can lead to a higher mutation rate in the region (Radhakrishnan Sabarinathan et al., 2016; Schuster-Böckler B et al., 2012; Lawrence MS et al., 2013; Polak P et al., 2015). In addition, the BTB-ZF protein mamo belongs to this family and can recruit histone modification factors such as DNA methyltransferase 1 (DMNT1), cullin3 (CUL3), histone deacetylase 1 (HDAC1), and histone acetyltransferase 1 (HAT1) to perform chromatin remodeling at specific genomic sites. Although mutations can be predicted by the characteristics of apparent chromatin, the forms of mutations are diverse and random. Therefore, this does not violate randomness. For clarity, we have rewritten this section of the manuscript.

      J Grey Monroe, Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022 Feb;602(7895):101-105.

      Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A, López-Bigas N. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature. 2016;532(7598):264-267.

      Schuster-Böckler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488(7412):504-507.

      Lawrence MS, Stojanov P, Polak P, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214-218.

      Polak P, Karlić R, Koren A, et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature. 2015;518(7539):360-364.

      Mathew R, Seiler MP, Scanlon ST, et al. BTB-ZF factors recruit the E3 ligase cullin 3 to regulate lymphoid effector programs. Nature. 2012;491(7425):618-621.

      Wu et al: Transposon insertion occurs in a timely manner upstream of the cortex gene in melanic pepper moths (47), which may be caused by the similar binding of transcription factors and opening of chromatin.

      No, we do not think that the peppered moth mutation is Lamarckian at all, as seems to be inferred here (notice that by mentioning the peppered moth twice, you are juxtaposing a larval plastic trait and then a purely genetic wing trait, making it even more confusing). Also, the "in a timely manner" is superfluous, because all the data are consistent with a chance mutation being eventually picked up by strong directional mutation. The mutation and selection did NOT occur at the same time.

      Response: Thank you very much for your careful work. The insertion of one transposon into the first intron of the cortex gene of industrial melanism in peppered moth occurred in approximately 1819, which is similar to the time of industrial development in the UK (Arjen E Van't Hof, et al., 2016). In multiple species of Heliconius, the cortex gene is the shared genetic basis for the regulation of wing coloring patterns. Interestingly, the SNP of the cortex, associated with the wing color pattern, does not overlap among different Heliconius species, such as H. erato dephoon and H. erato favorinus, which suggests that the mutations of this cortex gene have different origins (Nadeau NJ et al., 2016). In addition, in Junonia coenia (van der Burg KRL et al., 2020) and Bombyx mori (Ito K et al., 2016), the cortex gene is a candidate for regulating changes in wing coloring patterns. Overall, the cortex gene is an evolutionary hotspot for the variation of multiple butterfly and moth wing coloring patterns. In addition, it was observed that the variations in the cortex are diverse in these species, including SNPs, indels, transposon insertions, inversions, etc. This indicates that although there are evolutionary hotspots in the insect genome, this variation is random. Therefore, this is not completely detached from randomness.

      Arjen E Van't Hof, et al., The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016 Jun 2;534(7605):102-5. doi: 10.1038/nature17951.

      Nadeau NJ, Pardo-Diaz C, Whibley A, et al. The gene cortex controls mimicry and crypsis in butterflies and moths. Nature. 2016;534(7605):106-110.

      van der Burg KRL, Lewis JJ, Brack BJ, Fandino RA, Mazo-Vargas A, Reed RD. Genomic architecture of a genetically assimilated seasonal color pattern. Science. 2020;370(6517):721-725.

      Ito K, Katsuma S, Kuwazaki S, et al. Mapping and recombination analysis of two moth colour mutations, Black moth and Wild wing spot, in the silkworm Bombyx mori. Heredity (Edinb). 2016;116(1):52-59.

      Wu et al: Therefore, we proposed that the genetic basis of color pattern evolution may mainly be system-guided programmed events that induce mutations in specific genomic regions of key genes rather than just random mutations of the genome.

      While the mutational target of pigment evolution may involve a handful of developmental regulator genes, you do not have the data to infer such a strong conclusion at the moment.

      The current formulation is also quite strong and teleological: "system-guided programmed events" imply intentionality or agency, an idea generally assigned to the anti-scientific Intelligent Design movement. There are a few examples of guided mutations, such as the adaptation phase of gRNA motifs in bacterial CRISPR assays, where I could see the term ""system-guided programmed events" to be applicable. But it is irrelevant here.

      Response: Thank you very much for your careful work. The CRISPR-CAS9 system is indeed very well known. In addition, recent studies have found the existence of a Cas9-like gene editing system in eukaryotes, such as Fanzor. Fanzor (Fz) was reported in 2013 as a eukaryotic TnpB-IS200/IS605 protein encoded by the transposon origin, and it was initially thought that the Fz protein (and prokaryotic TnpBs) might regulate transposon activity through methyltransferase activity (Saito M et al., 2023). Fz has recently been found to be a eukaryotic CRISPR‒Cas system. Although this system is found in fungi and mollusks, it raises hopes for scholars to find similar systems in other higher animals. However, before these gene-editing systems became popular, zinc finger nucleases (ZFNs) were already being studied as a gene-editing system in many species. The mechanism by which ZFN recognizes DNA depends on its zinc finger motif (Urnov FD et al., 2005). This is consistent with the mechanism by which transcription factors recognize DNA-binding sites.

      Furthermore, a very important evolutionary event in sexual reproduction is chromosome recombination during meiosis, which helps to produce more abundant alleles. Current research has found that this recombination event is not random. In mice and humans, the PRDM9 transcription factors are able to plan the sites of double-stranded breaks (DSBs) in meiosis recombination. PRDM9 is a histone methyltransferase consisting of three main regions: an amino-terminal region resembling the family of synovial sarcoma X (SSX) breakpoint proteins, which contains a Krüppel-associated box (KRAB) domain and an SSX repression domain (SSXRD); a PR/SET domain (a subclass of SET domains), surrounded by a pre-SET zinc knuckle and a post-SET zinc finger; and a long carboxy-terminal C2H2 zinc finger array. In most mammalian species, during early meiotic prophase, PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site. Subsequently, meiotic DNA DSBs are formed at hotspots through the combined action of SPO11 and TOPOVIBL. In addition, some proteins (such as RAD51) are involved in repairing the break point. In summary, programmed events of induced and repaired DSBs are widely present in organisms (Bhattacharyya T et al., 2019).

      These studies indicate that on the basis of randomness, the genome also exhibits programmability.

      Saito M, Xu P, Faure G, et al. Fanzor is a eukaryotic programmable RNA-guided endonuclease. Nature. 2023;620(7974):660-668.

      Urnov FD, Miller JC, Lee YL, et al. Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature. 2005;435(7042):646-651.

      Bhattacharyya T, Walker M, Powers NR, et al. Prdm9 and Meiotic Cohesin Proteins Cooperatively Promote DNA Double-Strand Break Formation in Mammalian Spermatocytes [published correction appears in Curr Biol. 2021 Mar 22;31(6):1351]. Curr Biol. 2019;29(6):1002-1018.e7.

      Wu et al: Based on this assumption, animals can undergo phenotypic changes more quickly and more accurately to cope with environmental changes. Thus, seemingly complex phenotypes such as cryptic coloring and mimicry that are highly similar to the background may have formed in a short period. However, the binding sites of some transcription factors widely distributed in the genome may be reserved regulatory interfaces to cope with potential environmental changes. In summary, the regulation of genes is smarter than imagined, and they resemble a more advanced self-regulation program.

      Here again, I can agree with the idea that certain genetic architectures can evolve quickly, but I cannot support the concept that the genetic changes are guided or accelerated by the environment. And again, none of this is relevant to the current findings about Bm-mamo.

      Response: Thank you very much for your careful work. Darwin's theory of natural selection has epoch-making significance. I deeply believe in the theory that species strive to evolve through natural selection. However, with the development of molecular genetics, Darwinism’s theory of undirected random mutations and slow accumulation of micromutations resulting in phenotype evolution has been increasingly challenged.

      The prerequisite for undirected random mutations and micromutations is excessive reproduction to generate a sufficiently large population. A sufficiently large population can contain sufficient genotypes to face various survival challenges. However, it is difficult to explain how some small groups and species with relatively low fertility rates have survived thus far. More importantly, the theory cannot explain the currently observed genomic mutation bias. In scientific research, every theory is constantly being modified to adapt to current discoveries. The most famous example is the debate over whether light is a particle or a wave, which has lasted for hundreds of years. However, in the 20th century, both sides seemed to compromise with each other, believing that light has a wave‒particle duality.

      Epigenetics has developed rapidly since 1987. Epigenetics has been widely accepted, defined as stable inheritance caused by chromosomal conformational changes without altering the DNA sequence, which differs from genetic research on variations in gene sequences. However, an increasing number of studies have found that histone modifications can affect gene sequence variation. In addition, both histones and epigenetic factors are essentially encoded by genes in the genome. Therefore, genetics and epigenetics should be interactive rather than parallel. However, some transcription factors play an important role in epigenetic modifications. Meiotic recombination is a key process that ensures the correct separation of homologous chromosomes through DNA double-stranded break repair mechanisms. The transcription factor PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site (Bhattacharyya T et al., 2019). Interestingly, mamo has been identified as an important candidate factor for meiosis hotspot setting in Drosophila (Winbush A et al., 2021).

      Bhattacharyya T, Walker M, Powers NR, et al. Prdm9 and Meiotic Cohesin Proteins Cooperatively Promote DNA Double-Strand Break Formation in Mammalian Spermatocytes [published correction appears in Curr Biol. 2021 Mar 22;31(6):1351]. Curr Biol. 2019;29(6):1002-1018.e7.

      Winbush A, Singh ND. Genomics of Recombination Rate Variation in Temperature-Evolved Drosophila melanogaster Populations. Genome Biol Evol. 2021;13(1): evaa252.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      Response: Thank you very much for your careful work. First, we believe that competitive research is sometimes coincidental and sometimes intentional. Our research began in 2009, when we began to configure the recombinant population. In 2016, we published an article on comparative transcriptomics (Wu et al. 2016). The article mentioned above has a strong interest in our research and is based on our transcriptome analysis for further research, with the aim of making a preemptive publication.

      To discourage such behavior, we cannot cite it and do not want to discuss it in our paper.

      Songyuan Wu et al. Comparative analysis of the integument transcriptomes of the black dilute mutant and the wild-type silkworm Bombyx mori. Sci Rep. 2016 May 19:6:26114. doi: 10.1038/srep26114.

      • line 52-54. The numerous biological functions of insect coloration have been thoroughly investigated. It is reasonable to expect more references for each function.

      Response: Thank you very much for your careful work. We have made the appropriate modifications.

      Sword GA, Simpson SJ, El Hadi OT, Wilps H. Density-dependent aposematism in the desert locust. Proc Biol Sci. 2000;267(1438):63-68. … Behavior.

      Barnes AI, Siva-Jothy MT. Density-dependent prophylaxis in the mealworm beetle Tenebrio molitor L. (Coleoptera: Tenebrionidae): cuticular melanization is an indicator of investment in immunity. Proc Biol Sci. 2000;267(1439):177-182. … Immunity.

      N. F. Hadley, A. Savill, T. D. Schultz, Coloration and Its Thermal Consequences in the New-Zealand Tiger Beetle Neocicindela-Perhispida. J Therm Biol. 1992;17, 55-61…. Thermoregulation.

      Y. G. Hu, Y. H. Shen, Z. Zhang, G. Q. Shi, Melanin and urate act to prevent ultraviolet damage in the integument of the silkworm, Bombyx mori. Arch Insect Biochem. 2013; 83, 41-55…. UV protection.

      M. Stevens, G. D. Ruxton, Linking the evolution and form of warning coloration in nature. P Roy Soc B-Biol Sci. 2012; 279, 417-426…. Aposematism.

      K. K. Dasmahapatra et al., Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature.2012; 487, 94-98…. Mimicry.

      Gaitonde N, Joshi J, Kunte K. Evolution of ontogenic change in color defenses of swallowtail butterflies. Ecol Evol. 2018;8(19):9751-9763. Published 2018 Sep 3. …Crypsis.

      B. S. Tullberg, S. Merilaita, C. Wiklund, Aposematism and crypsis combined as a result of distance dependence: functional versatility of the colour pattern in the swallowtail butterfly larva. P Roy Soc B-Biol Sci.2005; 272, 1315-1321…. Aposematism and crypsis combined.

      • line 59-60. This general statement needs to be rephrased. I suggest remaining simple by indicating that insect coloration can be pigmentary, structural, or bioluminescent. About the structural coloration and associated nanostructures, the authors could cite recent reviews, such as: Seago et al., Interface 2009 + Lloyd and Nadeau, Current Opinion in Genetics & Development 2021 + "Light as matter: natural structural colour in art" by Finet C. 2023. I suggest doing the same for recent reviews that cover pigmentary and bioluminescent coloration in insects. The very recent paper by Nishida et al. in Cell Reports 2023 on butterfly wing color made of pigmented liquid is also unique and worth to consider.

      Response: Thank you very much for your careful work. We have made the appropriate modifications.

      Insect coloration can be pigmentary, structural, or bioluminescent. Pigments are mainly synthesized by the insects themselves and form solid particles that are deposited in the cuticle of the body surface and the scales of the wings (10, 11). Interestingly, recent studies have found that bile pigments and carotenoid pigments synthesized through biological synthesis are incorporated into body fluids and passed through the wing membranes of two butterflies (Siproeta stelenes and Philaethria diatonica) via hemolymph circulation, providing color in the form of liquid pigments (12). The pigments form colors by selective absorption and/or scattering of light depending on their physical properties (13). However, structural color refers to colors, such as metallic colors and iridescence, generated by optical interference and grating diffraction of the microstructure/nanostructure of the body surface or appendages (such as scales) (14, 15). Pigment color and structural color are widely distributed in insects and can only be observed by the naked eye in illuminated environments. However, some insects, such as fireflies, exhibit colors (green to orange) in the dark due to bioluminescence (16). Bioluminescence occurs when luciferase catalyzes the oxidation of small molecules of luciferin (17). In conclusion, the color patterns of insects have evolved to be highly sophisticated and are closely related to their living environments. For example, cryptic color can deceive animals via high similarity to the surrounding environment. However, the molecular mechanism by which insects form precise color patterns to match their living environment is still unknown.

      • RNAi approach. I have no doubt that obtaining phenocopies by electroporation might be difficult. However, I find the final sampling a bit limited to draw conclusions from the RT-PCR (n=5 and n=3 for phenocopies and controls). Three control individuals is a very low number. Moreover, it would nice to see the variability on the plot, using for example violin plots.

      Response: Thank you very much for your careful work. In the RNAi experiment, we injected more than 20 individuals in the experimental group and control group. We have added the RNAi data in Figure 4.

      Author response table 1.

      • Figure 6. Higher magnification images of Dazao and Bm-mamo knockout are needed, as shown in Figure 5 on RNAi.

      Response: Thank you very much for your careful work. We have added enlarged images.

      Author response image 3.

      • Phylogenetic analysis/Figure S6. I am not sure to what extent the sampling is biased or not, but if not, it is noteworthy that mamo does not show duplicated copies (negative selection?). It might be interesting to discuss this point in the manuscript.

      Response: Thank you very much for your careful work. mamo belongs to the BTB/POZ zinc finger family. The members of this family exhibit significant expansion in vertebrates. For example, there are 3 members in C. elegans, 13 in D. melanogaster, 16 in Bombyx mori, 58 in M. musculus and 63 in H. sapiens (Wu et al, 2019). These members contain conserved BTB/POZ domains but vary in number and amino acid residue compositions of the zinc finger motifs. Due to the zinc finger motifs that bind to different DNA recognition sequences, there may be differences in their downstream target genes. Therefore, when searching for orthologous genes from different species, we required high conservation of their zinc finger motif sequences. Due to these strict conditions, only one orthologous gene was found in these species.

      • Differentially-expressed genes and CP candidate genes (line 189-191). The manuscript would gain in clarity if the authors explain more in details their procedure. For instance, they moved from a list of 191 genes to CP genes only. Can they say a little bit more about the non-CP genes that are differentially expressed? Maybe quantify the number of CPs among the total number of differentially-expressed genes to show that CPs are the main class?

      Response: Thank you very much for your careful work. The nr (Nonredundant Protein Sequence Database) annotations for 191 differentially expressed genes in Supplemental Table S3 were added. Among them, there were 19 cuticular proteins, 17 antibacterial peptide genes, 6 transporter genes, 5 transcription factor genes, 5 cytochrome genes, 53 enzyme-encoding genes and others. Because CP genes were significantly enriched in differentially expressed genes (DEGs), previous studies have found that BmorCPH24 can affect pigmentation. Therefore, we first conducted an investigation into CP genes.

      • Interaction between Bm-mamo. It is not clear why the authors chose to investigate the physical interaction of Bm-mamo protein with the putative binding site of yellow, and not with the sites upstream of tan and DDC. Do the authors test one interaction and assume the conclusion stands for the y, tan and DDC?

      Response: Thank you very much for your careful work. In D. melanogaster, the yellow gene is the most studied pigment gene. The upstream and intron sequences of the yellow gene have been identified as containing multiple cis-regulatory elements. Due to the important pigmentation role of the yellow gene and its variable cis-regulatory sequence among different species, it has been considered a research model for cis-regulatory elements (Laurent Arnoult et al. 2013, Gizem Kalay et al. 2019, Yaqun Xin et al. 2020, Yann Le Poul et al. 2020). We use yellow as an example to illustrate the regulation of the mamo gene. We added this description to the discussion.

      Laurent Arnoult et al. Emergence and diversification of fly pigmentation through evolution of a gene regulatory module. Science. 2013 Mar 22;339(6126):1423-6. doi: 10.1126/science.1233749.

      Gizem Kalay et al. Redundant and Cryptic Enhancer Activities of the Drosophila yellow Gene. Genetics. 2019 May;212(1):343-360. doi: 10.1534/genetics.119.301985. Epub 2019 Mar 6.

      Yaqun Xin et al. Enhancer evolutionary co-option through shared chromatin accessibility input. Proc Natl Acad Sci U S A. 2020 Aug 25;117(34):20636-20644. doi: 10.1073/pnas.2004003117. Epub 2020 Aug 10.

      Yann Le Poul et al. Regulatory encoding of quantitative variation in spatial activity of a Drosophila enhancer. Sci Adv. 2020 Dec 2;6(49):eabe2955. doi: 10.1126/sciadv.abe2955. Print 2020 Dec.

      • Please note that some controls are missing for the EMSA experiments. For instance, the putative binding-sites should be mutated and it should be shown that the interaction is lost.

      Response: Thank you very much for your careful work. In this study, we found that the DNA recognition sequence of mamo is highly conserved across multiple species. In D. melanogaster, studies have found that mamo can directly bind to the intron of the vasa gene to activate its expression. The DNA recognition sequence they use is TGCGT (Shoichi Nakamura et al. 2019). We chose a longer sequence, GTGCGTGGC, to detect the binding of mamo. This binding mechanism is consistent across species.

      • Figure 7 and supplementary data. How did the name of CPs attributed? According to automatic genome annotation of Bm genes and proteins? Based on Drosophila genome and associated gene names? Did the authors perform phylogenetic analyses to name the different CP genes?

      Response: Thank you very much for your careful work. The naming of CPs is based on their conserved motif and their arrangement order on the chromosome. In previous reports, sequence identification and phylogenetic analysis of CPs have been carried out in silkworms (Zhengwen Yan et al. 2022, Ryo Futahashi et al. 2008). The members of the same family have sequence similarity between different species, and their functions may be similar. We have completed the names of these genes in the text, for example, changing CPR2 to BmorCPR2.

      Zhengwen Yan et al. A Blueprint of Microstructures and Stage-Specific Transcriptome Dynamics of Cuticle Formation in Bombyx mori. Int J Mol Sci. 2022 May 5;23(9):5155.

      Ningjia He et al. Proteomic analysis of cast cuticles from Anopheles gambiae by tandem mass spectrometry. Insect Biochem Mol Biol. 2007 Feb;37(2):135-46.

      Maria V Karouzou et al. Drosophila cuticular proteins with the R&R Consensus: annotation and classification with a new tool for discriminating RR-1 and RR-2 sequences. Insect Biochem Mol Biol. 2007 Aug;37(8):754-60.

      Ryo Futahashi et al. Genome-wide identification of cuticular protein genes in the silkworm, Bombyx mori. Insect Biochem Mol Biol. 2008 Dec;38(12):1138-46.

      • Discussion. I think the discussion would gain in being shorter and refocused on the understudied role of CPs. Another non-canonical aspect of the discussion is the reference to additional experiments (e.g., parthogenesis line 290-302, figure S14). This is not the place to introduce more results, and it breaks the flow of the discussion. I encourage the authors to reshuffle the discussion: 1) summary of their findings on mamo and CPs, 2) link between pigmentation mutant phenotypes, pigmentation pattern and CPs, 3) general discussion about the (evo-)devo importance of CPs and link between pigment deposition and coloration. Three important papers should be mentioned here:

      1) Matsuoka Y and A Monteiro (2018) Melanin pathway genes regulate color and morphology of butterfly wing scales. Cell Reports 24: 56-65... Yellow has a pleiotropic role in cuticle deposition and pigmentation.

      2) https://arxiv.org/abs/2305.16628... Link between nanoscale cuticle density and pigmentation

      3) https://www.cell.com/cell-reports/pdf/S2211-1247(23)00831-8.pdf... Variation in pigmentation and implication of endosomal maturation (gene red).

      Response: Thank you very much for your careful work. We have rewritten the discussion section.

      1) We have summarized our findings.

      Bm-mamo may affect the synthesis of melanin in epidermis cells by regulating yellow, DDC, and tan; regulate the maturation of melanin granules in epidermis cells through BmMFS; and affect the deposition of melanin granules in the cuticle by regulating CP genes, thereby comprehensively regulating the color pattern in caterpillars.

      2) We describe the relationship among the pigmentation mutation phenotype, pigmentation pattern, and CP.

      Previous studies have shown that the lack of expression of BmorCPH24, which encodes important components of the endocuticle, can lead to dramatic changes in body shape and a significant reduction in the pigmentation of caterpillars (53). We crossed Bo (BmorCPH24 null mutation) and bd to obtain F1(Bo/+Bo, bd/+), then self-crossed F1 and observed the phenotype of F2. The lunar spots and star spots decreased, and light-colored stripes appeared on the body segments, but the other areas still had significant melanin pigmentation in double mutation (Bo, bd) individuals (Fig. S13). However, in previous studies, introduction of Bo into L (ectopic expression of wnt1 results in lunar stripes generated on each body segment) (24) and U (overexpression of SoxD results in excessive melanin pigmentation of the epidermis) (58) strains by genetic crosses can remarkably reduce the pigmentation of L and U (53). Interestingly, there was a more significant decrease in pigmentation in the double mutants (Bo, L) and (Bo, U) than in (Bo, bd). This suggests that Bm-mamo has a stronger ability than wnt1 and SoxD to regulate pigmentation. On the one hand, mamo may be a stronger regulator of the melanin metabolic pathway, and on the other hand, mamo may regulate other CP genes to reduce the impact of BmorCPH24 deficiency.

      3) We discussed the importance of (evo-) devo in CPs and the relationship between pigment deposition and coloring.

      CP genes usually account for over 1% of the total genes in an insect genome and can be categorized into several families, including CPR, CPG, CPH, CPAP1, CPAP3, CPT, CPF and CPFL (68). The CPR family is the largest group of CPs, containing a chitin-binding domain called the Rebers and Riddiford motif (R&R) (69). The variation in the R&R consensus sequence allows subdivision into three subfamilies (RR-1, RR-2, and RR-3) (70). Among the 28 CPs, 11 RR-1 genes, 6 RR-2 genes, 4 hypothetical cuticular protein (CPH) genes, 3 glycine-rich cuticular protein (CPG) genes, 3 cuticular protein Tweedle motif (CPT) genes, and 1 CPFL (like the CPFs in a conserved C-terminal region) gene were identified. The RR-1 consensus among species is usually more variable than RR-2, which suggests that RR-1 may have a species-specific function. RR-2 often clustered into several branches, which may be due to gene duplication events in co-orthologous groups and may result in conserved functions between species (71). The classification of CPH is due to their lack of known motifs. In the epidermis of Lepidoptera, the CPH genes often have high expression levels. For example, BmorCPH24 had a highest expression level, in silkworm larvae epidermis (72). The CPG protein is rich in glycine. The CPH and CPG genes are less commonly found in insects outside the order Lepidoptera (73). This suggests that they may provide species specific functions for the Lepidoptera. CPT contains a Tweedle motif, and the TweedleD1 mutation has a dramatic effect on body shape in D. melanogaster (74). The CPFL members are relatively conserved in species and may be involved in the synthesis of larval cuticles (75). CPT and CPFL may have relatively conserved functions among insects. The CP genes are a group of rapidly evolving genes, and their copy numbers may undergo significant changes in different species. In addition, RNAi experiments on 135 CP genes in brown planthopper (Nilaparvata lugens) showed that deficiency of 32 CP genes leads to significant defective phenotypes, such as lethal, developmental retardation, etc. It is suggested that the 32 CP genes are indispensable, and other CP genes may have redundant and complementary functions (76). In previous studies, it was found that the construction of the larval cuticle of silkworms requires the precise expression of over two hundred CP genes (22). The production, interaction, and deposition of CPs and pigments are complex and precise processes, and our research shows that Bm-mamo plays an important regulatory role in this process in silkworm caterpillars. For further understanding of the role of CPs, future work should aim to identify the function of important cuticular protein genes and the deposition mechanism in the cuticle.

      Minor comments - Title. At this stage, there is no evidence that Bm-mamo regulates caterpillar pigmentation outside of Bombyx mori. I suggest to precise 'silkworm caterpillars' in the title.

      Response: Thank you very much for your careful work. We have modified the title.

      • Abstract, line 29. Because the knowledge on pigmentation pathway(s) is advanced, I would suggest writing 'color pattern is not fully understood' instead of 'color pattern is not clear'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 29. I suggest 'the transcription factor' rather than 'a transcription factor'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 30. If you want to mention the protein, the name 'Bm-mamo' should not be italicized.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 30. 'in the silkworm'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 31. 'mamo' should not be italicized.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 31. 'in Drosophila' rather 'of Drosophila'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 32. Bring detail if the gamete function is conserved in insects? In all animals?

      Response: Thank you very much for your careful work. The sentence was changed to “This gene has a conserved function in gamete production in Drosophila and silkworms and evolved a pleiotropic function in the regulation of color patterns in caterpillars.”

      • Introduction, line 51. I am not sure what the authors mean by 'under natural light'. Please rephrase.

      Response: Thank you very much for your careful work. We have deleted “under natural light”.

      • line 43. I find that the sentence 'In some studies, it has been proven that epidermal proteins can affect the body shape and appendage development of insects' is not necessary here. Furthermore, this sentence breaks the flow of the teaser.

      Response: Thank you very much for your careful work. We have deleted this sentence.

      • line 51-52. 'Greatly benefit them' should be rephrased in a more neutral way. For example, 'colours pattern have been shown to be involved in...'.

      Response: Thank you very much for your careful work. We have modified to “and the color patterns have been shown to be involved in…”

      • line 62. CPs are secreted by the epidermis, but I would say that CPs play their structural role in the cuticle, not directly in the epidermis. I suggest rephrasing this sentence and adding references.

      Response: Thank you very much for your careful work. We have modified “epidermis” to “cuticle”.

      • line 67. Please indicate that pathways have been identified/reported in Lepidoptera (11). Otherwise, the reader does not understand if you refer to previous biochemical in Drosophila for example.

      Response: Thank you very much for your careful work. We have modified this sentence. “Moreover, the biochemical metabolic pathways of pigments used for color patterning in Lepidoptera…have been reported.”

      • line 69. Missing examples of pleiotropic factors and associated references. For example, I suggest adding: engrailed (Dufour, Koshikawa and Finet, PNAS 2020) + antennapedia (Prakash et al., Cell Reports 2022) + optix (Reed et al., Science 2011), etc. Need to add references for clawless, abdominal-A.

      Response: Thank you very much for your careful work. We have made modifications.

      • line 76. The simpler term moth might be enough (instead of Lepidoptera).

      Response: Thank you very much for your careful work. We have modified this to “insect”.

      • line 96. I would simplify the text by writing "Then, quantitative RT-PCR was performed..."

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 112. 'Predict' instead of 'estimate'?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 113. I would rather indicate the full name first, then indicate mamo between brackets.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 144. The Perl script needs to be made accessible on public repository.

      Response: Thank you very much for your careful work.

      • line 147-150. Too many technical details here. The details are already indicated in the material and methods section. Furthermore, the details break the flow of the paragraph.

      Response: Thank you very much for your careful work. We have modified this section.

      • line 152. Needs to make the link with the observed phenotypes in Figure 1. Just needs to state that RNAi phenocopies mimic the mutant alleles.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 153-157. Too many technical details here. The details are already indicated in the material and methods section. Furthermore, the details break the flow of the paragraph.

      Response: Thank you very much for your careful work. We have simplified this paragraph.

      • line 170. Please rephrase 'conserved in 30 species' because it might be understood as conserved in 30 species only, and not in other species.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 182. Maybe explain the rationale behind restricting the analysis to +/- 2kb. Can you cite a paper that shows that most of binding sites are within 2kb from the start codon?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 182. '14,623 predicted genes'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 183. '10,622 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 183. Redundancy. Please remove 'silkworm' or 'B. mori'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 187. '10,072 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 188. '9,853 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 200. "Therefore, the differential...in caterpillars" is a strong statement.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 204. Remove "The" in front of eight key genes. Also, needs a reference... maybe a recent review on the biochemical pathway of melanin in insects.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 220. This sentence is too general and vague. Please explicit what you mean by "in terms of evolution". Number of insect species? Diversity of niche occupancy? Morphological, physiological diversity?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 285. The verb "believe" should be replaced by a more neutral one.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 354-355. This sentence needs to be rephrased in a more objective way.

      Response: Thank you very much for your careful work. We have rewritten this sentence.

      • line 378. Missing reference for MUSCLE.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 379. Pearson model?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 408. "The CRISPRdirect online software was used...".

      Response: Thank you very much for your careful work. We have modified this sentence.

      • Figure 1. In the title, I suggest indicating Dazao, bd, bdf as it appears in the figure. Needs to precise 'silkworm larval development'.

      Response: Thank you very much for your careful work. We have modified this figure title.

      • Figure 3. In the title, is the word 'pattern' really necessary? In the legend, please indicate the meaning of the acronyms AMSG and PSG.

      Response: Thank you very much for your careful work. We have modified this figure legend.

      • Figure S7A. Typo 'Znic finger 1', 'Znic finger 2', 'Znic finger 3',

      Response: Thank you very much for your careful work. We have fixed these typos. .

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In 2019, Wilkinson and colleagues (PMID: 31142833) managed to break the veil in a 20-year open question on how to properly culture and expand Hematopoietic Stem Cells (HSCs). Although this study is revolutionizing the HSC biology field, several questions regarding the mechanisms of expansion remain open. Leveraging on this gap, Zhang et al.; embarked on a much-needed investigation regarding HSC self-renewal in this particular culturing setting.

      The authors firstly tacked the known caveat that some HSC membrane markers are altered during in vitro cultures by functionally establishing EPCR (CD201) as a reliable and stable HSC marker (Figure 1), demonstrating that this compartment is also responsible for long-term hematopoietic reconstitution (Figure 3). Next in Figure 2, the authors performed single-cell omics to shed light on the potential mechanisms involved in HSC maintenance, and interestingly it was shown that several hematopoietic populations like monocytes and neutrophils are also present in this culture conditions, which has not been reported. The study goes on to functionally characterize these cultured HSCs (cHSC). The authors elegantly demonstrate using state-of-the-art barcoding strategies that these culturing conditions provoke heterogeneity in the expanding HSC pool (Figure 4). In the last experiment (Figure 5), it was demonstrated that cHSC not only retain their high EPCR expression levels but upon transplantation, these cells remain more quiescent than freshly-isolated controls.

      Taken together, this study independently validates that the proposed culturing system works and provides new insights into the mechanisms whereby HSC expansion takes place.

      Most of the conclusions of this study are well supported by the present manuscript, some aspects regarding experimental design and especially the data analysis should be clarified and possibly extended.

      1) The first major point regards the single-cell (sc) omics performed on whole cultured cells (Figure 2):

      a. The authors claim that both RNA and ATAC were performed and indeed some ATAC-seq data is shown in Figure 2B, but this collected data seems to be highly underused.

      We appreciate the opportunity to clarify our analytical approach and the rationale behind it. In our study, we employed a novel deep learning framework, SAILERX, for our analysis. This framework is specifically designed to integrate multimodal data, such as RNAseq and ATACseq. The advantage of SAILERX lies in its ability to correct for technical noise inherent in sequencing processes and to align information from different modalities. Unlike methods that force a hard alignment of modalities into a shared latent space, SAILERX allows for a more refined integration. It achieves this by encouraging the local structures of the two modalities, as measured by pairwise similarities.

      To put it more simply, SAILERX combines RNAseq and ATACseq data, ensuring that the unique characteristics of each data type are respected and used to enhance the overall biological picture, rather than forcing them into a uniform framework.

      While it is indeed possible to analyze the ATAC-seq and RNA-seq modalities separately, and we acknowledge the potential value in such an approach, our primary objective in this study was to highlight the relatively low content of HSCs in cultures. This finding is a key point of our work, and the multiome data support this from a molecular point of view.

      The Seurat object we provide was created to facilitate further analysis by interested researchers. This object simplifies the exploration of both the ATAC-seq and RNA-seq data, allowing for additional investigations that may be of interest to the scientific community. We hope this explanation clarifies our methodology and its implications.

      b. It's not entirely clear to this reviewer the nature of the so-called "HSC signatures"(SF2C) and why exactly these genes were selected. There are genes such as Mpl and Angpt1 which are used for Mk-biased HSCs. Maybe relying on other HSC molecular signatures (PMID: 12228721, for example) would not only bring this study more into the current field context but would also have a more favorable analysis outcome. Moreover reclustering based on a different signature can also clarify the emergence of relevant HSC clusters.

      In our study, the selection of the HSC signature in our work was based on well-referenced datasets on well-defined HSPCs, as detailed in the "v. HSC signature" section of our methods. This signature was projected also to another single-cell RNA sequencing dataset generated from ex vivo expanded HSC culture (PMID: 35971894, see Author response image 1 below), demonstrating again an association primarily to the most primitive cells (at least based on gene expression).

      Author response image 1.

      Projection of "our" HSC signature on scRNAseq data from independent work.

      In further response to the suggestion here, we have also examined the molecular signature of HSCs referenced in PMID: 12228721 but also of another HSC signature from PMID: 26004780 in our data (Author response image 2). While these signatures do indeed enrich for cells that fall in the cluster of molecularly defined HSCs, our analysis indicates that neither of them significantly improves the identification of HSCs in our dataset compared to the signature we originally used. This finding reinforces our confidence in the appropriateness of our chosen HSC signature for this study.

      Author response image 2.

      Projection of alternative HSC signatures onto the SAILERX UMAP.

      Regarding the specific genes Mpl and Angpt1, we respectfully oppose the view that these genes are exclusively associated with MK-biased HSCs. There is substantial evidence supporting the broader role of Mpl in regulating HSCs, regardless of any particular "lineage bias". Similarly, while Angpt1 has been less extensively studied, its role in HSCs, as examined in PMID: 25821987, suggests a more general association with HSCs rather than a specific impact on MKs. Therefore, we maintain that it is more accurate to consider these genes as HSC-associated rather than restricted to MK-biased HSCs.

      Finally, addressing the comment on reclustering based on different signatures, we would like to clarify that the clustering process is independent of the projection of signatures. The clustering aims to identify cell populations based on their overall molecular profiles, and while signatures can aid in characterizing these populations, they do not influence the clustering process itself.

      c. The authors took the hard road to perform experiments with the elegant HSC-specific Fgd5-reporter, and they claim in lines 170-171 that it "failed to clearly demarcate in our single-cell multimodal data". This seems like a rather vague statement and leads to the idea that the scRNA-seq experiment is not reliable. It would be interesting to show a UMAP with this gene expression regardless and also potentially some other HSC markers.

      We understand the concerns raised about our statement on the performance of the Fgd5-reporter in our multimodal data analysis. Our aim was not to suggest that single-cell molecular data are unreliable. Instead, we intended to point out specific challenges associated with scRNA sequencing, notably the high rates of dropout. Regarding the specific example of Fgd5, it appears this transcript is not efficiently captured by 10x technology. Our previous 10x scRNA-seq experiments on cells from the Fgd5 reporter strain (Säwén et al., eLife 2018; Konturek-Ciesla et al., Cell Rep. 2023) support this observation. Despite cells being sorted as Fgd5-reporter positive, many showed no detectable transcripts.

      We consider it pertinent to note that our study integrates ATAC-seq data in conjunction with single-cell molecular data. We believe that this integration, coupled with the analytical methods we have employed, potentially offers a way to address some of the limitations typically associated with scRNA sequencing. However, in assessing frequencies, we observe that the number of candidate HSCs identified via single-cell molecular data is substantially higher compared to those identified through flow cytometry, the latter which we demonstrate correlate functionally with genuine long-term repopulating activity.

      With respect to Fgd5, as depicted in our analysis below, there appears to be an enrichment of cells in the cluster identified as HSCs, as well as a significant representation in the cycling cell cluster (Author response image 3). Regarding the projection of other individual genes, the Seurat object we have provided allows for such projections to be readily performed. This offers an opportunity for further exploration and validation of our findings by interested researchers.

      Author response image 3.

      Feature plot depicting Fgd5 expression in the SAILERX UMAP.

      2) During the discussion and in Figure 4, the authors ponder and demonstrate that this culturing system can provoke divert HSC close expansion, having also functional consequences. This a known caveat from the original system, but in more recent publications from the original group (PMID: 36809781 and PMID: 37385251) small alterations into the protocol seem to alleviate clone selection. It's intriguing why the authors have not included these parameters at least in some experiments to show reproducibility or why these studies are not mentioned during the discussion section.

      Thank you for pointing out the recent publications (PMID: 36809781 and PMID: 37385251) that discuss modifications to the HSC culturing system. We appreciate the opportunity to address why these were not included in our discussion or experiments.

      Firstly, it is important to note that these papers were published after the submission of our manuscript. In fact, one of the studies (PMID: 36809781) references the preprint version of our work on Biorxiv. This timing meant that we were unable to consider these studies in our initial manuscript or incorporate any of their findings into our experimental designs.

      Furthermore, as strong advocates for the peer-review system, we prioritize references that have undergone this rigorous process. Preprints, while valuable for early dissemination of research findings, do not offer the same level of scrutiny and validation as peer-reviewed publications. Our approach was to rely on the most relevant and rigorously reviewed literature available to us at the time of submission. This included, most notably, the original and ground-breaking work by Wilkinson et al., which provided a foundational basis for our research.

      We acknowledge that the field of HSC research is rapidly evolving, and new findings, such as those mentioned, are continually emerging. These new studies undoubtedly contribute valuable insights into HSC culturing systems and their optimization. However, given the timing of their publication relative to our study, we were not able to include them in our analysis or discussion.

      3) In this reviewer's opinion, the finding that transplanted cHSC are more quiescent than freshly isolated controls is the most remarkable aspect of this manuscript. There is a point of concern and an intriguing thought that sprouts from this experiment. It is empirical that for this experiment the same HSC dose is transplanted between both groups. This however is technically difficult since the membrane markers from both groups are different. Although after 8 weeks chimerism levels seem to be the same (SF5D) for both groups, it would strengthen the evidence if the author could demonstrate that the same number of HSCs were transplanted in both groups, likely by limiting dose experiments. Finally, it's interesting that even though EE100 cells underwent multiple replication rounds (adding to their replicative aging), these cells remained more quiescent once they were in an in vivo setting. Since the last author of this manuscript has also expertise in HSC aging, it would be interesting to explore whether these cells have "aged" during the expansion process by assessing whether they display an aged phenotype (myeloid-skewed output in serial transplantations and/or assisting their transcriptional age).

      We thank the reviewer for the insightful observations regarding the quiescence of transplanted cultured HSCs. We appreciate the opportunity to clarify the experimental design and its implications, particularly in the context of HSC aging.

      The primary aim of comparing cKit-enriched bone BM cells with cultured cells was to investigate if ex vivo activated HSCs exhibit a similar proliferation pattern to in vivo quiescent HSCs post-transplantation. This comparison was crucial for evaluating the similarity between in vitro cultured and "unmanipulated" HSC behavior. While we acknowledge the technical challenge of transplanting equivalent HSC doses between groups due to differing membrane markers, our study design focused on assessing stem cell activity post-culture. This was quantitatively evaluated by calculating the repopulating units (detailed in Table 1 and Fig S4G), rather than through a limiting dilution assay. There exists a plethora of literature demonstrating the correlation between these assays, although of course the limiting dilution assay is designed to provide a more exact output.

      Regarding the intriguing aspect of HSC aging in the context of ex vivo expansion, our observations indicate that both the subfraction of ex vivo expanded cells (Fig 3 and Fig S3) and the entire cultured population (Fig 4B, Fig 5B, Fig S4A, and Fig S5B) maintain long-term multilineage reconstitution capacity post-transplantation. This suggests that the PVA-culture system does not lead to apparent signs of "HSC aging," despite the cells undergoing active self-renewal in vitro. This is further supported by our serial transplantation experiments, where cultured cells continued to demonstrate multilineage capacity rather than any evident myeloid-biased reconstitution 16 weeks post-second transplantation (see Author response image 4 below).

      Author response image 4.

      Serial transplantation behavior of ex vivo expanded HSCs. 5 million whole BM cells from primary transplantation were transplanted together with 5 million competitor whole BM cells. The control group was transplanted with 100 cHSCs freshly isolated from BM for the primary transplantation. Mann-Whitney test was applied and the asterisks indicate significant differences. , p < 0.05; , p < 0.01; ***, p < 0.0001. Error bars denote SEM.

      However, we recognize the complexity of defining HSC aging and the potential for the culture system to influence certain aspects of this process. The association of aging signature genes with HSC primitiveness and young signature genes with differentiation presents an interesting dichotomy. Our analysis of a native dataset on young mice and the projection of aged signatures onto our multiome data (as shown below for a set of genes known to be induced at higher levels in aged HSCs (f.i. Wahlestedt et al., Nature Comm 2017), aging scRNAseq data from PMID: 36581635) does not directly indicate that the culture system promotes HSC aging compared to aged Lin-Sca+Kit+ cells. Yet, we do not rule out the possibility that culturing may influence other facets of the HSC aging process.

      In conclusion, while our current data do not provide direct evidence of induced HSC aging through the culture system, this remains a compelling area for future research. The potential impact of ex vivo culture on aspects of the HSC aging process warrants further exploration, and we appreciate your suggestion in this regard.

      Author response image 5.

      No evident signs of "molecular aging" following ex vivo expansion of HSCs. Young and aged scRNAseq data from PMID: 36581635 were integrated and explored from the perspective of known genes associating to HSC aging. The top row depicts contribution to UMAPs from young and aged cells (two left plots), cell cycle scores of the cells, and the expression of EPCR and CD48 as examples markers for primitive and more differentiated cells, respectively. The expression of the HSC aging-associated genes Wwtr1, Cavin2, Ghr, Clu and Aldh1a1 was then assessed in the data as well as in the SAILERX UMAP of cultured HSCs (bottom row).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Zhang and colleagues characterise the behaviour of mouse hematopoietic stem cells when cultured in PVA conditions, a recently published method for HSC expansion (Wilkinson et al., Nature, 2019), using multiome analysis (scRNA-seq and scATACseq in the same single cell) and extensive transplantation experiments. The latter are performed in several settings including barcoding and avoiding recipient conditioning. Collectively the authors identify several interesting properties of these cultures namely: 1) only very few cells within these cultures have long-term repopulation capacity, many others, however, have progenitor properties that can rescue mice from lethal myeloablation; 2) single-cell characterisation by combined scRNAseq and scATACseq is not sufficient to identify cells with repopulation capacity; 3) expanded HSCs can be engrafted in unconditioned host and return to quiescence.

      The authors also confirm previous studies that EPCRhigh HSCs have better reconstitution capability than EPCRlow HSCs when transplanted.

      Strengths:

      The major strength of this manuscript is that it describes how functional HSCs are expanded in PVA cultures to a deeper extent than what has been done in the original publication. The authors are also mindful of considering the complexities of interpreting transplantation data. As these PVA cultures become more widely used by the HSC community, this manuscript is valuable as it provides a better understanding of the model and its limitations.

      Novelty aspects include:

      • The authors determined that small numbers of expanded HSCs enable transplantation into non-conditioned syngeneic recipients.

      • This is to my knowledge the first report characterising the output of PVA cultures by multiome. This could be a very useful resource for the field.

      • They are also the first to my knowledge to use barcoding to quantify HSC repopulation capacity at the clonal level after PVA culture.

      • It is also useful to report that HSCs isolated from fetal livers do expand less than their adult counterparts in these PVA cultures.

      Weaknesses:

      • The analysis of the multiome experiment is limited. The authors do not discuss what cell types, other than functional or phenotypic HSCs are present in these cultures (are they mostly progenitors or bona fide mature cells?) and no quantifications are provided.

      The primary objective of our manuscript was to characterize the features of HSCs expanded from ex vivo culture. In this context, our analysis of the single cell multiome sequencing data was predominantly centered on elucidating the heterogeneity of cultures, along with subsequent in vivo functional analysis. This focus is reflected in our comparisons between the molecular features of ex vivo cultured candidate HSCs (cHSCs) and "fresh/unmanipulated" HSCs, as illustrated in Figures 2D-E of our manuscript.

      Our findings provide substantial evidence that ex vivo expanded cells share significant similarities with HSCs isolated from the BM in terms of molecular features, differentiation potential, heterogeneity, and in vivo stem cell activity/function. This suggests that the ex vivo culture system closely mimics several aspects of the in vivo environment, thereby broadening the potential applications of this system for HSC research.

      Regarding the presence of other cell types in the cultures, it is important to note that most cells did not express mature lineage markers, suggesting their immature status. However, we acknowledge the presence of some mature lineage marker-positive cells within the cultures. These cells are represented by the endpoints in our SAILERX UMAP, indicating a progression from immature to more differentiated states within the culture system.

      While the main emphasis of our study was on HSCs, we understand the importance of acknowledging and briefly discussing the presence and characteristics of other cell types in the cultures. This aspect provides a more comprehensive understanding of the culture system and its impact on cellular heterogeneity, although it was for the most part beyond the scope of our studies.

      • Barcoding experiments are technically elegant but do not bring particularly novel insights. We respectfully disagree with the view that our barcoding experiments do not offer novel insights. We believe that the application of barcoding technology in our study represents a significant advancement over previous methods, both in terms of quantitative rigor and ethical considerations.

      In the foundational work by Wilkinson et al., clonal assessments were indeed performed, but these were limited in scope and largely served as proof of concept. Our use of barcoding technology, on the other hand, allowed for a comprehensive quantitative assessment of the expansion potential of HSC clones. This technology enabled us to rigorously quantify the number of HSC clones capable of undergoing at least three self-renewing divisions (e.g. those clones present in 5 separate animals), while also revealing the heterogeneity in their expansion potential.

      One alternative approach could have been to culture single HSCs and distribute the progeny among multiple mice for analysis. However, when considering the sheer number of mice that would be required for such an experiment for quantitative assessments, it becomes evident that viral barcoding is a far superior method. Not only does it provide a more efficient and scalable approach to assessing clonal expansion, but it also significantly reduces the number of animals required for the study, aligning with the principles of ethical research and animal welfare.

      In conclusion, we assert that the barcoding experiments conducted in our study are not only technically robust but also yield novel quantitative insights into the dynamics of HSC clones within expansion cultures. These insights have value not only for current research but also hold potential implications for future applications.

      • The number of mice analysed in certain experiments is fairly low (Figures 1 and 5).

      We would like to clarify our approach in the context of the 3R (replacement, refinement, and reduction) policy, which guides ethical considerations in animal research.

      In alignment with the 3R principles, our study was designed to minimize the use of experimental animals wherever possible. For most experiments, including those presented in Figures 1 and 5, we adopted a standard of using five mice per group. Based on the effect sizes we observed, we concluded that this sample size was appropriate for most parts of our study.

      Specifically for Figure 5, we used two animals per time point, totaling seven animals per treatment group. It is important to note that we did not monitor the same animals over time but used different animals at each time point, as mice had to be sacrificed for the type of analyses conducted. Despite the seemingly small sample size, the results we obtained were remarkably consistent across groups. This consistency provided strong evidence that ex vivo activated HSCs return to a more quiescent state after being transplanted into unconditioned recipients. Given the clear and consistent nature of these results, we determined that including more animals for the purpose of additional statistical analysis was not necessary.

      Our approach reflects a balance between adhering to ethical standards in animal research and ensuring the scientific validity and reliability of our findings. We believe that the sample sizes chosen for our experiments are justified by the consistent and significant results we obtained, which contribute meaningfully to our understanding of HSC behavior post-transplantation.

      • The manuscript remains largely descriptive. While the data can be used to make useful recommendations to future users working with PVA cultures and in general with HSCs, those recommendations could be more clearly spelled out in the discussion.

      We fully agree that many aspects of our study are indeed descriptive, which is reflective of the exploratory and foundational nature of this type of research.

      We have strived to provide clear and direct recommendations for researchers interested in utilizing the PVA culture system, which we believe are evident throughout our manuscript:

      1) Utility of Viral Delivery in HSC Research: Our research, particularly through the use of barcoding experiments, underscores the effectiveness of viral delivery methods in HSC studies. While barcoding itself is a significant tool, it is the underlying process of viral delivery that truly exemplifies the potential of this approach. Our work shows that the culture system is highly conducive to maintaining HSC activity, which is critical for genetic manipulation. This is evident not only in our current study but also in our previous work that included for transient delivery methods (Eldeeb et al., Cell Reports 2023).

      2) Non-conditioned transplantation: Our findings suggest that non-conditioned transplantation can be a valuable method in studying both normal and malignant hematopoiesis. This approach can complement genetic lineage tracing models, providing a more native and physiological context for hematopoietic research. We state this explicitly in our discussion.

      3) Integration with recent technical advances: The combination of the PVA culture system with recent developments in transplantation biology, genome engineering, and single-cell technologies holds significant promise. This integration is likely to yield exciting discoveries with relevance to both basic and clinically oriented hematopoietic research. This is the end statement of our discussion.

      While our manuscript is in a way tailored to those with experience in HSC research, we have made a concerted effort to ensure that the content is accessible and informative to a broader audience, including those less familiar with this area of study. Our intention is to provide a resource that is both informative for experts in the field and approachable for newcomers.

      • The authors should also provide a discussion of the other publications that have used these methods to date.

      We would like to clarify that the scope of literature on the specific methods we employed, particularly in the context of our research objectives, is not extensive. Most of the existing references on these methods come from a relatively narrow range of research groups. In preparing our manuscript, we tried to be comprehensive yet selective in our citations to maintain focus and relevance. Our referencing strategy was guided by the aim to include literature that was most directly pertinent to our study's methodologies and findings.

      Overall, the authors succeeded in providing a useful set of experiments to better interpret what type of HSCs are expanded in PVA cultures. More in-depth mining of their bioinformatic data (by the authors or other groups) is likely to highlight other interesting/relevant aspects of HSC biology in relation to this expansion methodology.

      We are grateful for the overall positive assessment of our work and the recognition of its contributions to understanding HSC expansion in PVA cultures.

      We agree that every study, including ours, has its limitations, particularly regarding the scope and depth of exploration. It is challenging to cover every aspect comprehensively in a single study. Our research aimed to provide a foundational understanding of HSCs in PVA cultures, and we are pleased that this goal appears to have been met.

      We also concur with your point on the potential for further in-depth mining of our bioinformatic data. Our hope is that this data can serve as a resource (or at least a starting point) for other investigators.

      In conclusion, we hope that our responses have adequately addressed your queries and clarified any concerns. We are committed to contributing to the growth of knowledge in HSC research and look forward to the advancements that our study might enable, both within our team and the wider scientific community.

      Reviewer #1 (Recommendations For The Authors):

      1) In Line 150, the R packages can/should be mentioned just in the method section;

      We have moved this text to the methods section.

      2) In Figure F3C adding a legend next to the plot would assist the reader in identifying which populations are referred to, as the same color pellet is used for other panels;

      We have now adjusted the figure legend position to make it more clear for the reader.

      3) In Figure 4D, for the pre-culture experiments 1000 cHSCs were used and then in the post-culture 1200 cHSCs were used. Can the authors justify the different numbers?

      The decision to use 1000 cHSCs in the pre-culture experiments and 1200 cHSCs in the post-culture experiments was not based on a specific rationale favoring one cell number over the other. In our Method section, we have detailed our experimental design, which was structured to provide robust and reliable readouts of HSC behavior and characteristics in different conditions.

      We consider the two cell numbers – 1000 and 1200 – to be quite similar in the context of our experimental aims. Since the readouts here are based on clonal assessments, this slight difference in cell numbers is unlikely to significantly impact the overall conclusions drawn from these experiments. The primary focus of our study was on qualitative aspects of HSC behavior and function, rather than on quantitative differences that might arise from small variations in initial cell numbers.

      4) In SF5F it would help readers if a line plot (per group) was also shown together with the dot plots. Moreover, applying statistics to the trend lines (Wilcoxon, for example) would strengthen the argument that cHSCs divide less than control cells.

      We would like to clarify that the data presented in SF5F were derived from different animals at each respective time point. As such, the data points at each time point represent independent measurements from separate animals, rather than a continuous measurement from the same set of animals over time. Therefore, creating a line plot that connects each time point within a group would inadvertently convey a misleading impression of a longitudinal study on the same animals, which is not reflective of the actual experimental design. Instead, the dot plot format was chosen as it more accurately depicts the independent and discrete nature of the measurements at each time point. Our current data presentation method was selected to provide the most accurate and transparent representation of our findings.

      Reviewer #2 (Recommendations For The Authors):

      Listed below are recommendations to further improve this manuscript:

      Major Comments

      1) Fig 1: the authors showed that EPCRhigh HSCs have better reconstitution capability than EPCRlow HSCs via bone marrow transplantation. Additionally, mice receiving cultured EPCRhigh SLAM LSK cells were more efficiently radioprotected than those receiving PVA expanded EPCRlow SLAM LSK.

      a. In addition to Fig.1F, authors should show the lineage distributions and chimerism of mice receiving cultured EPCRhigh and EPCRlow SLAM LSK respectively.

      We have indeed analyzed the lineage distribution in these experiments, and our findings indicate no statistically significant differences between the groups (see graph in Author response image 6). This suggests that the cultured EPCRhigh and EPCRlow SLAM LSK cells do not preferentially differentiate into specific lineages in a way that would impact the overall interpretation of our results.

      Author response image 6.

      Regarding the chimerism in peripheral blood (PB) lineages, Fig. 1F in our manuscript currently shows the PB myeloid chimerism. We chose to focus on this parameter as it most directly relates to our study's objectives. We did here not transplant with competitor cells, and in most cases, the chimerism levels reached 100% for lineages other than T cells (T cells being more radioresistant). Based on our analysis, including data on chimerism in other PB lineages would not significantly enhance the understanding of the functional capacity of the transplanted cells, as the myeloid chimerism data already provides a robust indicator of their engraftment and functional potential.

      We believe that our current presentation of data in Fig. 1F, along with the additional analyses provided in the results section, offers a comprehensive understanding of the behavior and potential of the cultured EPCRhigh and EPCRlow SLAM LSK cells.

      b. Fig1F: only 5 mice were used in each group. Could this result occur by chance? Testing with Fisher's exact test with the data provided results in p=0.16. The authors should consider adding more animals or adding the p-value above (or from another relevant test) for readers' consideration.

      We acknowledge the point that only five mice were used in each group and understand the concern regarding the robustness of our findings.

      As correctly noted, applying Fisher's exact test to the data in Fig. 1F results in a p-value which does not reach the conventional threshold for statistical significance. However, one might also consider the analysis of the KM survival curve, which associated with a p-value of 0.0528 (Fig. 1F, left graph below; Gehan-Breslow-Wilcoxon test). A similar test on the single-cell culture transplantation experiment (Fig. 1E, right graph below) also demonstrated statistical significance (p-value = 0.0485).

      While these p-values meet (or are very close to) the conventional criteria for statistical significance (p<0.05), we have chosen to place greater emphasis on effect sizes rather than strictly on p-values. This decision is based on our belief that effect sizes provide a more direct and meaningful measure of the biological impact observed in our experiments. We find that the effect sizes observed are compelling and consistent with the overall narrative of our study.

      Author response image 5.

      2) The characterisation of the multiome experiment is highly underdeveloped.

      a. From an experimental point of view, it is not clear how the PVA culture for this experiment was started. Are there technical/biological replicates? Have several PVA cultures been pooled together?

      We have included these details in the revised text to ensure a comprehensive understanding of our experimental setup.

      b. Fig2B: The authors should present more data as to how each of the clusters was annotated (bubble plot of marker genes used for annotation?) and importantly the percentage of cells in each of the clusters. It is particularly relevant to note what % is the cluster annotated as HSCs and compare that to the % of phenotypic HSCs and the % repopulating HSCs calculated in the transplantation experiments.

      In our study, the annotation of clusters was primarily based on reference genes for cell types from prior works in the field, such as from our recent work (Konturek-Ciesla et al., Cell Reports 2023). Additionally, we employed transcription factor (TF) motifs to assign identities to these clusters. This approach is relatively standard in the field, and we believe it provides a robust framework for our analysis. We included information on some of the key TF motifs used to guide our annotations.

      Regarding the assignment of a percentage to cells within the HSC cluster, we initially had reservations about the utility of this measure. This is because the transcriptional identity of HSCs might not align precisely with their identity based on candidate HSC protein markers. There are complexities related to transcriptional continuums that could influence the interpretation of such data. However, acknowledging your request for this information, we have now included the percentage of cells in the HSC cluster in Fig. 2B for reference.

      We also wish to highlight that when isolating EPCR+ cells, which encompasses a range of CD48 expression, clustering becomes much less distinct, as shown in Fig. 2E. Most of these cells do not demonstrate long-term functional HSC activity in a transplantation setting (as presented in Figure 3). This observation underscores the challenges in deducing HSC identity based solely on molecular data and reinforces the importance of functional validation.

      c. Are there any mature cells in these PVA cultures? The annotations presented in the table under the UMAP are vague: Are cluster 4 monocytes or monocytes progenitors? Same for clusters 0,1 and 7 - are these progenitors or more mature cells? How were HPCs (cluster 3) distinguished from cHSCs (cluster 5)?

      We agree with your observation that the annotations for certain clusters, such as clusters 4, 0, 1, and 7, as well as the distinction between HPCs (cluster 3) and cHSCs (cluster 5), appear vague. This vagueness to some extent stems from the challenges inherent in comparing cultured cells to their counterparts isolated directly from animals. Most reference data defining cell types are derived from cells in their native state, and less is known about how these definitions translate to the progeny of HSPCs cultured in vitro.

      In our study, we used the expression of reference genes and enriched transcription factor motifs to annotate clusters. This method, while useful, has its limitations in precisely defining the maturation stage of cells in culture. The enrichment of lineage-defining factors at the ends of the UMAP suggests the presence of more mature cells, whereas the lack of lineage marker expression in the majority of cells implies a general lack of terminal differentiation.

      This issue is not necessarily unique to the culture situation, as similar challenges in cell type annotation are encountered in other contexts, such as the analysis of granulocyte-macrophage progenitors in bone marrow, where a vast range of cell types and clusters are identified (e.g., PMID: 26627738). To try to address these challenges, we employed an approach detailed in the methods section under the header "iv. ATAC processing and cluster annotation." We assessed marker genes for clusters using Enrichr for cell types, relying on databases designed to provide gene expression identities to defined cell types. This methodology informed our references to the clusters.

      In summary, while our annotations provide a general overview of the cell types present in the cultures, we acknowledge the complexities and limitations in precisely defining these types, particularly in distinguishing between progenitors and more mature cells. We hope this explanation clarifies our approach and the considerations behind our cluster annotations, but at the same time feel that the alternative approaches have their own drawbacks.

      d. What is the meaning of the trajectories presented in Figure 2C? In the absence of a comparison to i) what is observed either when HSCs are cultured in control/non-expanding conditions ii) an in vivo landscape of differentiation in mouse bone marrow; this analysis does not bring any relevant piece of information.

      We understand the perspective on comparisons to control conditions and in vivo differentiation landscapes. However, we respectfully disagree with the viewpoint that the analysis that we have performed does not bring relevant information.

      The trajectory analysis in Figure 2C is intended to provide insights into the cell types generated in our PVA cultures and the potential differentiation pathways they may follow. This kind of analysis is particularly valuable in the context of understanding how in vitro cultures can support HSC maintenance and differentiation, which is a topic of significant interest in the field. For instance, studies like PMID: 31974159 have highlighted the importance of combining in vitro HSC cultures with molecular investigations.

      While we acknowledge that our analysis would benefit from a direct comparison to control or non-expanding conditions, as well as to an in vivo differentiation landscape, we believe that the information provided by our current analysis still holds substantial value. It offers a glimpse into the possible cellular dynamics and differentiation routes within our culture system, which can be a valuable reference point for other investigators working with similar systems.

      Regarding the confidence in computed differentiation trajectories, we recognize that this is an area where caution is warranted. Computational approaches to define cell differentiation pathways have inherent limitations and should be interpreted within the context of their assumptions and the data available. This challenge is not unique to our work but is a broader issue in the field of computational biology.

      In conclusion, while we agree that additional comparative analyses could further enrich our findings, we maintain that the trajectory analysis presented in Figure 2C contributes meaningful insights into cell differentiation in our PVA culture system. We believe these insights are of interest and value to researchers exploring the complex interplay of HSC maintenance and differentiation in vitro.

      3) The addition of barcoding experiments is appreciated. However, it is already known that upon transplantation clonal output is highly heteroegeneous, with a small number of clones predominating over others. This is particularly the case after myeloablation conditioning.

      a. The "pre-culture" experimental design makes sense. The "post-culture" one is however ambiguous in terms of result interpretation. The authors observe fewer clones contributing to a large proportion of the graft (>5%) than in the "pre-culture" setting. Their interpretation is that expanded HSCs are functionally more homogeneous than the input HSCs. However, in the pre-culture experiment, there are 19 days of expansion during which there will be selection pressures over culture plus ongoing differentiation. In the post-culture experiment, there is no time for such pressures to be exerted. Therefore the conclusion drawn by the authors is not the only conclusion. I would encourage the authors to compare the "pre-culture" experiment to an experiment in which cHSCs are in culture for 48h, then barcoded, and then transplanted. This would be much more informative and would allow a proper comparison of expanded HSCs vs input HSCs.

      We understand the perspective that a shorter culture period would reduce the influence of selection pressures and differentiation, potentially allowing for a more direct comparison between expanded HSCs and input HSCs. However, we would like to point out that similar experiments have been conducted in the past, as referenced in our work (PMID: 28224997) and others (PMID: 21964413). These studies have demonstrated a significant heterogeneity in the reconstituting clones when barcoding is done early and cells are transplanted directly.

      In light of previous research, we are confident that our methodology — tracking the fates of candidate HSC clones throughout the culture period and assessing the outcomes of individual cells from these expanding clones — yields significant and pertinent insights. We want to highlight the significance of barcoding cells late in the culture, a strategy that allows us to barcode cells that have already been subjected to potential selection pressures within the culture environment. Our primary objective is to investigate the effects of these selection pressures on the subsequent in vivo behavior of the cells that emerge from this process. By focusing on this aspect, we aim to deepen the understanding of how in vitro culture conditions influence the functional characteristics and heterogeneity of HSCs after expansion. We believe this approach provides a unique perspective on the adaptive changes HSCs undergo during culture and their implications for transplantation efficacy and HSC biology. Our study thus addresses a critical question in the field: how do the conditions and selection pressures inherent to in vitro culture impact the quality and behavior of HSCs upon their return to an in vivo environment?

      b. Another experiment the authors may consider is barcoding in unconditioned recipients as there the bottleneck of selecting specific clones should be lower. In addition, this could nicely complement the return to quiescence observed in Figure 5 (see point below)

      We agree that this experiment could provide valuable insights, particularly in understanding how different selection pressures might affect HSC clones in various transplantation contexts. It would indeed be a worthwhile complement to our observations in Figure 5 regarding the return to quiescence of HSCs post-transplantation.

      However, we would like to point out that our study already includes a substantial amount of data and analyses aimed at addressing specific research questions within this defined scope. The addition of an experiment with barcoding in unconditioned recipients, while undoubtedly relevant and interesting, would extend beyond the boundaries we set for this particular study.

      4) Figure 5D-F, only 2 animals per condition were tested, so the experiment is underpowered for any statistics. How about cell viability of cHSC after in vitro culture? The authors have also not tested whether there is a difference in cell viability post-transplant between EE100 and control. In addition, comparing cell cycle profiles of donor EPCR+ HSCs in these transplanted mice would provide additional evidence to support the conclusion.

      Regarding the sample size, we acknowledge that only two animals per condition were used in these experiments, which limits the statistical power for robust quantitative analysis. This decision was guided by ethical considerations to minimize animal use, in line with the 3Rs principle (Replacement, Reduction, Refinement). Despite the small sample size, we believe that the strong trends observed in these experiments are indicative and consistent with our broader findings, although we recognize the limitations in terms of statistical generalization. At the same time, as we have written in the public response: "Specifically for Figure 5, we used two animals per time point, totaling seven animals per treatment group. It is important to note that we did not monitor the same animals over time but used different animals at each time point, as mice had to be sacrificed for the type of analyses conducted."

      In the context of post-transplant analysis, conducting separate viability assessments on transplanted cells is not typically informative. This is because non-viable cells would naturally be eliminated through biological processes such as phagocytosis soon after transplantation. Therefore, any post-transplant viability analysis would not provide meaningful insights into the engraftment potential or behavior of the transplanted cells.

      However, it is important to note that in all our cell isolation and analysis protocols, we routinely include viability markers. This practice ensures that the cell populations we study and report on are indeed viable. Including these markers is a standard part of our methodology and contributes to the accuracy and reliability of our data.

      Regarding the comparison of cell cycle profiles, we chose to focus on the cell trace assay as a means to monitor and track cell division history, which directly addresses the central theme here - informing on the proliferation and quiescence dynamics of transplanted HSCs. While comparing cell cycle profiles could perhaps offer an additional layer of information, we did not deem it essential for our core objectives.

      5) Several publications have used these PVA cultures and made comments on their strengths and limitations. They do not overlap with this study but should be discussed here for completeness (for example Che et al, Cell Reports, 2022; Becker et al., Cell Stem Cell, 2023; Igarashi, Blood Advances, 2023).

      See comments to reviewer 1.

      Minor Comments

      Figure 1C: should add in the legend that this is in peripheral blood.

      Figure 2C: typo in the title.

      Figure 3A: typo in "equivalent". We thank the reviewer for catching these errors, which we have now corrected.

      Figure 3B and 3C: symbol colours of EPCRhighCD48+ and EPCR- are too similar to distinguish the 2 groups easily. We highly recommend using contrasting colours.

      For easier visualization, we have changed the symbol types and colors in our revised version.

      Fig3B and S3A-B: authors should show statistical significance in comparing the 4 fractions. We have now added this information.

      In the discussion, the authors rightly point out a paper that described EPCR+ HSCs. There are other papers that also looked at EPCR intensity (high vs low), for example, Umemoto et al., EMBO J, 2022.

      While we acknowledge the relevance of the paper you mentioned, we faced constraints in the number of references we could include. Therefore, we prioritized citing the original demonstration of EPCR as an HSC marker, particularly focusing on the work by the Mulligan laboratory, which established that cells expressing the highest levels of EPCR exhibit the most potent HSC activity. We believe this reference most directly supports the core focus of our study and provides the necessary context for our findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Xia et al. investigated the mechanisms underlying Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). The authors observed that abnormal osteogenesis and adipogenesis are associated with decreased β-catenin in the necrotic femoral head of GONFH patients, and that the inhibition of β-catenin signalling leads to abnormal osteogenesis and adipogenesis in GONFH rats. Of interest, the deletion of β-catenin in Col2-expressing cells rather than in Osx-expressing cells leads to a GONFH-like phenotype in the femoral head of mice.

      Strengths:

      A strength of the study is that it sets up a Col2-expressing cell-specific β-catenin knockout mouse model that mimics the full spectrum of osteonecrosis phenotype of GONFH. This is interesting and provides new insights into the understanding of GONFH. Overall, the data are solid and support their conclusions.

      Reviewer #1 (Recommendations For The Authors):

      1) Fig. 1I should be quantified and presented as bar graphs to make it consistent with other data, and the significance should be shown.

      Reply: Thanks for your comments. We have provided the quantitative bar graph in the new version.

      2) Fig. 2H, beta-catenin, ALP and FABP4 should be labled below the X axis. Moreover, the pattern of Fig. 2H is different from other bar graphs and the dots for individual samples are missing, so I could not judge the N values for the experiments. N values should also be provided for Fig. 3.

      Reply: Thanks for your comments. We have added the labels of beta-catenin, ALP and FABP4 below the X axis in Fig. 2H. The modes of quantitative bar graphs were changed to show the N values in the each experiment.

      3) Fig. 4 shows the fate mapping of Col2+ cells and Osx+ cells in the femoral head. In this regard, the authors presented images for Col2-expressing cells at all the indicated time points, i.e. 1, 3, 6, and 9 months, but only presented images for Osx-expressing cells for 1 month while those for 3, 6, and 9 months are missing.

      Reply: Thanks for your comments. Here, we showed that the expression of Osx+ cells in the femoral head were total different with Col2+ cells at the age of 3, 6 month, further indicating they were two different progenitor lineage cells.

      Author response image 1.

      4) Some experiments may need to be described in more detail" e.g., ABH/Orange G staining, biomechanical testing, μCT analysis, et al.

      Reply: Thanks for your comments. We have provided more information of experiment procedures.

      5) This study proposed that Col2-expressing cells play a key role in the progression of GONFH, did the authors use Col2+ cells for the in vitro experiments?

      Reply: As in vitro experiments could not reflect the location of Col2-expressing cells in the femoral head, therefore here we applied in vivo lineage tracing study. After as long as 9 month of linage trace, we thoroughly showed the self-renew ability and osteogenic commitment of Col2+ cells, as well as its space variation in the femoral head with age. Conditional knockout of β-catenin caused that Col2+ cells trans-differentiated into adipogenic cells instead of osteogenic cells, which directly clarified the mechanism of Col2+ cells leading to GONFH-like phenotype in mice.

      6) A few typo errors, such as Line 13, "contribute" should be "contributes"; Line 118, "reveled" should be "revealed".

      Reply: We have revised the grammar errors in the new manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reported a study to uncover that β-catenin inhibition disrupting the homeostasis of osteogenic/adipogenic differentiation contributes to the development of Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). In this study, they first observed abnormal osteogenesis and adipogenesis associated with decreased β-catenin in the necrotic femoral head of GONFH patients, but the exact pathological mechanisms of GONFH remain unknown. They then performed in vivo and in vitro studies to further reveal that glucocorticoid exposure disrupted osteogenic/adipogenic differentiation of bone marrow stromal cells (BMSCs) by inhibiting β-catenin signaling in glucocorticoid-induced GONFH rats, and specific deletion of β-catenin in Col2+ cells shifted BMSCs commitment from osteoblasts to adipocytes, leading to a full spectrum of disease phenotype of GONFH in adult mice.

      Strengths:

      This innovative study provides strong evidence supporting that β-catenin inhibition disrupts the homeostasis of osteogenic/adipogenic differentiation that contributes to the development of GONFH. This study also identifies an ideal genetically modified mouse model of GONFH. Overall, the experiment is logically designed, the figures are clear, and the data generated from humans and animals is abundant supporting their conclusions.

      Weaknesses:

      There is a lack of discussion to explain how the Wnt agonist 1 works. There are several types of Wnt ligands. It is not clear if this agonist only targets Wnt1 or other Wnts as well. Also, why Wnt agonist 1 couldn't rescue the GONFH-like phenotype in β-cateninCol2ER mice needs to be discussed.

      Reply: Thanks for your constructive comments. Wnt agonist 1 is a cell-permeating activator of the Wnt signaling pathway that induces transcriptional activity dependent on β-catenin (PMID: 25514428,18624906). In the present study, we aim to demonstrate that activation of β-catenin signaling could alleviate the phenotype of rat GONFH, thus only β-catenin and downstream targets (RUNX2, ALP, PPAR-γ, FABP4) expressions were detected after Wnt agonist 1 intervention. Conditional knockout β-catenin in Col2+ cells lead to an mouse GONFH-like phenotype. Wnt agonist 1 couldn't rescue this GONFH-like, as it did not activate β-catenin signaling. We have discussed them in the new version.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors are trying to delineate the mechanism underlying the osteonecrosis of the femoral head.

      Strengths:

      The authors provided compelling in vivo and in vitro data to demonstrate Col2+ cells and Osx+ cells were differentially expressed in the femoral head. Moreover, inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype including fat accumulation, subchondral bone destruction, and femoral head collapse, indicating that imbalance of osteogenic/adipogenic differentiation of Col2+ cells plays an important role in GONFH pathogenesis. Therefore, this manuscript provided mechanistic insights into osteonecrosis as well as potential therapeutic targets for disease treatment.

      Weaknesses:

      However, additional in-depth discussion regarding the phenotype observed in mice is highly encouraged.

      Reply: Thanks for your comments. Inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype. Lineage tracing data showed Col2+ cells and Osx+ cells were different cell populations, and we have discussed the potential mechanism caused the different phenotypes between β-cateninCol2ER mice and β-cateninOsxER mice.

      1) Why did the authors use dexamethasone in the cellular experiments but methylprednisolone to induce the GONFH rat model?

      Reply: Thanks for the comments. Here, we applied a dexamethasone (DEX)-treated BMSC model in vitro and a methylprednisolone (MPS)-induced rat model in vivo for GONFH study based on the published literatures (PMID: 37317020, 29662787, 29512684,35126710, 32835568).

      2) Both bone damage and fat accumulation were observed in 3-month-old and 6-month-old β-cateninCol2ER mice, but the femoral head collapse (the feature of GONFH at the late stage) only occurred in the older β-catenin Col2ER mice. This interesting observation needs to be discussed. Reply: Thanks for the comments. Bone damage caused a poor mechanical support is the key to femoral head collapse. Despite of similar trabecular bone loss and fat accumulation in the 3-month-old and 6-month-old β-cateninCol2ER mice, the older mice also presented extensive subchondral bone destruction. Integrated subchondral bone provided a well mechanical support for femoral head morphology, therefore femoral head collapse were occurred in the older β-cateninCol2ER mice.

      3) In the Materials and Methods, detailed information on the reagents should be provided.

      Reply: We have provided detailed information of the important reagents.

      4) As shown in Figure 4, β-cateninOsxER mice at 3 months of age did not show differences in lipid droplet area and empty lacunae rate, but there was a decrease in bone area. The authors should at least provide some necessary discussion of this phenomenon.

      Reply: Thanks for your comments. In the present study, we found few lipid droplet and empty lacuna but a significant decrease of bone mass in the femoral heads of β-cateninOsxER mice. Previous studies showed that specific knockout of β-catenin in Osx-expressing cells promoted osteoclast formation and activity, leading to the bone mass loss (PMID: 29124436, 34973494). We discussed this phenomenon in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editors for their constructive and critical comments/ suggestions regarding our paper. We have since extensively revised the manuscript accordingly, including the addition of new experimental data. Hope the readers, reviewers, and editors are now satisfied with the quality and significance of the revised paper.

      Our responses to the eLife assessment and the reviewers’ comment as well as the details of the revisions are described below.

      Wang et al present a useful manuscript that builds modestly on the group's previous publication on KLF1 (EKLF) K47R mice focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo (Shyu et al., Adv Sci (Weinh). 2022). The data demonstrates that Eklf (K74R) imparts these advantages in a background, age, and gender independent manner, not the consequence of the specific amino acid substitution, and transferable by BMT. However, the authors overstate the meaning of these results and the strength of evidence is incomplete, since only a melanoma model of cancer is used, it is unclear why only homozygous mutation is needed when only a small fraction of cells during BMT confer benefit, they do not show EKLF expression in any cells analyzed, and the PD-1 and PDL-1 experiments are not conclusive. The definitive mechanism relative to the prior publication from this group on this topic remains unclear.

      The issues in the assessment by the editor on our paper were also brought up by the reviewers. We have taken care of them by carrying out new experiments as well as rewriting of the paper to highlight the rationales and novel aspects of the current study, as described below in our responses to the three reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors Wang et al. present a study of a mouse model K74R that they claim can extend the life span of mice, and also has some anti-cancer properties. Importantly, this mechanism seems to be mediated by the hematopoietic system, and protective effects can be transferred with bone marrow transplantation.

      The authors need to be more specific in the title and abstract as to what is actually novel in this manuscript (a single tumor model), and what relies on previously published data (lifespan). Because many of these claims derive from previously published data, and the current manuscript is an extension of previously published work. The authors need to be more specific as to the actual data they present (they only use the B16 melanoma model) and the actual novelty of this manuscript.

      Especially experiments on life span are published and not sufficiently addressed in this actual paper, as the title would suggest.

      Indeed important to point out the novelty of this paper in comparison to the previous paper. First, we have modified the title, the abstract, and the text so to emphasize that the extended lifespan as well as tumor resistance could be transferred by from Eklf(K74R) mice to WT mice by a single transplantation of the Eklf(K74R) bone marrow mononuclear cells (BMT) to the WT mice at their young age (2 months).

      We now also provide several new experimental data including the one demonstrating that Eklf(K74R) mice are resistant to tumorigenesis of hepatocellular carcinoma as well (new Fig. 1E). These points are elaborated in more details below in my responses to the reviewers’ comments/ suggestions.

      Reviewer #2 (Public Review):

      The manuscript by Wang et al. follows up on the group's previous publication on KLF1 (EKLF) K47R mice and reduced susceptibility to tumorigenesis and increased life span (Shyu et al., Adv Sci (Weinh). Sep 2022;9(25):e2201409. doi:10.1002/ advs.202201409). In the current manuscript, the authors have described the dependence of these phenotypes on age, gender, genetic background, and hematopoietic translation of bone marrow mononuclear cells. Considering the current study is centered on the phenotypes described in the previous study, the novelty is diminished. Further, there are significant conceptual concerns in the study that make the inferences in the manuscript far less convincing. Major concerns are listed below:

      1) The authors mention more than once in the manuscript that KLF1 is expressed in range of blood cells including hematopoietic stem cells, megakaryocytes, T cells and NK cells. In the case of megakaryocytes, studies from multiple labs have shown that while EKLF is expressed megakaryocyte-erythroid progenitors, EKLF is important for the bipotential lineage decision of these progenitors, and its high expression promotes erythropoiesis, while its expression is antagonized during megakaryopoiesis. In the case of HSCs, the authors reference to their previous publication for KLF1's expression in these cells- however, in this study nor in the current study, there is no western blot documented to convincingly show that KLF1 protein is expressed at detectable levels in these cells. For T cells, the authors have referenced a study which is based on ectopic expression of KLF1. For NK cells, the authors reference bioGPS: however, upon inspection, this is also questionable.

      2) The current study rests on the premise that KLF1 is expressed in HSCs, NK cells and leukocytes, and the references cited are not sufficient to make this assumption, for the reasons mentioned in the first point. Therefore, the authors will have to show both KLF1 mRNA and protein levels in these cells, and also compare them to the expression levels seen in KLF1 wild type erythroid cells along with knockout erythroid cells as controls, for context and specificity.

      Regarding the novelties of the current story. Besides demonstration of the independence of the healthy longevity characteristics on age, gender, and genetic background, as exemplified by the tumor resistance, another novelty of the current study is that the healthy longevity characteristics, in particular the tumor resistance and extended lifespan, could be transferred by one-time long-term transplantation of the Eklf(K74R) bone marrow mononuclear cells from young Eklf(K74R) mice to young WT mice. Also, since submission of the last version of the paper, we have carried out new experiments, including the characterization of the anti-cancer capability of NK cells (new Fig. 6) as well as assay of the tumor-resistance of Eklf(K74R) mice to hepatocellular carcinoma (new Fig. 1E), etc.

      We have also modified the title, Abstract, and different parts of the text to highlight the novelties of the current study.

      As to the expression of EKLF in different hematopoietic blood cell types, we have now added a paragraph in Result (p.6 and p.7) describing what have been known in literature in relation to our data presented in the paper. Importantly, following the reviewer’s comments, we have since carried out Western blot analysis of EKLF expression in NK, T, and B cells (p. 6, p.7 and new Fig. S4B). Also noted is that the level of EKLF in B cells is very low and only could be detected by RT-qPCR (Fig. S4C) and RNA-Seq (Bio-GPS database)

      3) To get to the mechanism driving the reduced susceptibility to tumorigenesis and increased life span phenotypes in EKLF K74R mice, the authors report some observations- However, how these observations are connected to the phenotypes is unclear.

      a. For example, in Figure S3, they report that the frequency of NK1.1+ cells is higher in the mutant mice. The significance of this in relation to EKLF expression in these cells and the tumorigenesis and life span related phenotypes are not described. Again, as mentioned in the second point, KLF1 protein levels are not shown in these cells.

      b. In Figure 4, the authors show mRNA levels of immune check point genes, PD-1 and PD-l1 are lower in EKLF K74R mice in PB, CD3+ T cells and B220+ B cells. Again, the questions remain on how these genes are regulated by EKLF, and whether and at what levels EKLF protein is expressed in T cells and B cells relative to erythroid cells. Further, while the study they reference for EKLF's role in T cells is based on ectopic expression of EKLF in CD4+ T cells, in the current study, CD3+ T cells are used. Also, there are no references for the status of EKLF in B cells. These details are not discussed in the manuscript.

      Regarding this part of the questions and comments by the reviewer.

      First, we have since assayed the effect of the K74R substitution of EKLF on the in vitro cancer cell-killing ability of NK cells (termed NK1.1 cells in the previous version). The data showed that NK(K74R) cells have higher ability than the WT NK cells (new Fig. 6). This property together with the higher expression level of NK(K74R) cells in 24 month-old Eklf (K74R) mice than NK cells in 24 month-old WT mice would contribute to the higher tumor-resistance of the Eklf (K74R) mice. This point is also addressed on p. 8 andp.9.

      Second, as stated in previous sections, we have since carried out comparative Western blot analysis of the expression of EKLF protein in NK, CD3 T, and B cells of the WT and Eklf(K74R) mice, respectively (please see the new Fig. S4B). Also, description regarding what are known in literature in relation to our data on the expression of EKLF protein/ Eklf mRNA in different types of hematopoietic blood cells is now included in the Result (please see p.6 and p.7). Notably though, the level of EKLF protein in B cells was too low to be detected by WB (Fig. S4B).

      4) The authors perform comparative proteomics in the leukocytes of EKLF K74R and WT mice as shown in Figure S5. What is the status of EKLF levels in the mutant lysate vs wild type lysates based on this analysis? More clarity needs to be provided on what cells were used for this analysis and how they were isolated since leukocytes is a very broad term.

      The leukocytes used by us were isolated from the peripheral blood after removal of red blood cells, as described in the Materials and Methods.

      Also, the Western blot analysis of EKLF expression in the lysates of leukocytes/ white blood cells (WBC) has been shown previously, now presented in the new Figure S4A.

      5) In the discussion the authors make broad inferences that go beyond the data shown in the manuscript. They mention that the tumorigenesis resistance and long lifespan is most likely due to changes in transcription regulatory properties and changes in global gene expression profile of the mutant protein relative to WT leukocytes. And based on reduced mRNA levels of Pd-1 Pd-l1 genes in the CD3+ T cells and B220+ B cells from mutant mice, they "assert" that EKLF is an upstream regulator of these genes and regulates the transcriptomes of a diverse range of hematopoietic cells. The lack of a ChIP assay to show binding of WT EKLF on genes in these cells and whether this binding is reduced or abolished in the mutant cells, make the above statements unsubstantiated.

      We have since carried out ChIP-PCR analysis of EKLF-binding in the Pd-1 promoter (new Fig. S5). The data showed that EKLF was bound on the CACCC box at -103 of the promoter in WT CD3+T as well as in CD3+T(K74R) cells. This result is discussed on p.7.

      6) Where westerns are shown, the authors need to show the molecular weight ladder, and where qPCR data are shown for EKLF, it will be helpful to show the absolute levels and compare these levels to those in erythroid cells, along the corresponding EKLF knock out cells as controls.

      We have since included the molecular weight markers by the side of Western blots in Fig. S4. Also, we have added a new figure (Fig.S4C) showing the comparison of the expression levels of Eklf mRNA in B cells and CD3+ T cells to the mouse erythroleukemia (MEL) cells, as analyzed by RT-qPCR.

      Also, as indicated now in the Material and Methods section, the specificity of the primers used for RT-qPCR quantitation of mouse Eklf mRNA has been validated before by comparative analysis of wild type and EKLF-knockout mouse erythroid cells (Hung et al., IJMS, 2020).

      7) Figure S1D does not have a figure legend. Therefore, it is unclear what the blot in this figure is showing. In the text of the manuscript where they reference this figure, they mention that the levels of the mutant EKLF vs WT EKLF does not change in peripheral blood, while in the figure they have labeled WBCs for the blot, and the mRNA levels shown do seem to decrease in the mutant compared to WT peripheral blood.

      We apologize for this ignorance on our side. The data shown in the original Fig. SID (new Fig. S4A) are from Western blot analysis of EKLF protein and RT-qPCR analysis of Eklf mRNA in leukocytes/ white blood cells (WBC) isolated from the peripheral blood samples. We have now added back the figure legend and also rewritten the corresponding description in the text on p.6.

      Reviewer #3 (Public Review):

      Hung et al provide a well-written manuscript focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo. The work is fundamental and the data is convincing although several details remain incompletely elucidated. The major strengths of the manuscript include the clarity of the effect and the appropriate controls. For instance, the authors query whether Eklf (K74R) imparts these advantages in a background, age, and gender dependent manner, demonstrating that the findings are independent. In addition, the authors demonstrate that the effect is not the consequence of the specific amino acid substitution, with a similar effect on anticancer activity. Furthermore, the authors provide some evidence that PD-1 and PDL-1 are altered in Eklf (K74R) mice.

      Here we thank the encouraging comments by this reviewer.

      Finally, they demonstrate that the effects are transferrable with BMT. Several weaknesses are also evidence. For instance, only melanoma is tested as a model of cancer such that a broad claim of "anti-cancer activity" may be somewhat of an overreach.

      We have now included new data showing that the Eklf(K74R) mice also carry a higher anti-cancer ability against hepatocellular carcinoma than the WT mice (new Fig. 1E).

      It is also unclear why a homozygous mutation is needed when only a small fraction of cells during BMT can confer benefit. It is also difficult to explain how transplanted donor Eklf (K74R) HSCs confer anti-melanoma effect 7 and 14 days after BMT.

      First, these two observations not necessarily conflict with each other. It is likely that homozygosity, but not heterozygosity, of the K74R substitution in EKLF allows one or more types of hematopoietic blood cells to gain new functions, e.g. the higher cancer cell- killing capability of NK(K74R) cells (new Fig. 6), that help the mice to live long and healthy. Also, the data in Fig. 2D indicated that as low as 20% of the blood cells carrying homozygous Eklf(K74R) alleles in the recipient mice upon BMT could be sufficient to confer the mice a higher anti-cancer capability, likely in part due to cells such as NK(K74R). These points are now clarified in Discussion (p.9 and p.10).

      Second, we think the NK(K74R) cells contributed a significant part to the anti-cancer capability of the transplanted Eklf(K74R) blood in the recipient WT mice. As documented in some literature, e.g. Ferreira et al., Journal of Molecular Medicine (2019), the hematopoietic lineage of the NK cells would be fully reconstituted as early as 2 weeks after BMT. Of course, there could be other still unknown factors/ cells that also contribute to the tumor-resistance of the recipient mice at 7 day following BMT. This point is now touched upon on p.8 and p.9.

      Furthermore, it would be useful to see whether there are virulence marker alterations in the melanoma loci in WT vs Eklf (K74R) mice.

      As responded in the Public Reviews, we will analyze this in future together with other types of tumors in a separate study.

      Finally, the data in Fig 4c is difficult to interpret as decreased PD-1 and PDL-1 after knockdown of EKLF in vitro is not a useful experiment to corroborate how mutation without changing EKLF expression impacts immune cells. The work is impactful as it provides evidence that healthspan and lifespan may be modulated by specific hematological mutation but the mechanism by which this occurs is not completely elucidated by this work.

      As described in a previous section, we have since also carried out ChIP-qPCR analysis of the binding of WT EKLF and EKLF (K74R) on the Pd-1 promoter (new Fig. S5).

      Reviewer #1 (Recommendations For The Authors):

      The authors present interesting melanoma model data but need to tone down their claim of multiple effects of their model system. It needs to be clear what is new and what is previously known.

      As respond in the Public Reviews, we have since added new data on the tumor resistance of the Eklf(K74R) mice to hepatocellular carcinoma (new Fig. 1E). We have also modified the title as well as highlighted the novel points in the Abstract and text of the revised draft.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the major concerns listed in the public review, the minor concerns that the authors could address are listed below:

      1) Will be helpful to describe why was the pulmonary melanoma focus assay chosen for metastasis assay?

      We now describe on p. 4 the rationale behind the initial choice of this assay for analysis of the anti-cancer capability of the Eklf(K74R) mice. Also, we have since included data from experiment using the subcutaneous cancer cell inoculation assay for comparative analysis of the anti-hepatocellular carcinoma capability of Eklf(K74R) and WT mice (Fig. 1E and p.5).

      2) Reference #61 for B16-F10-luc cells cited in the methods does not have details on the generation of these cells. What these cells are and why this model was chosen needs to be described.

      Sorry about not providing this information before. We now describe the generation of B16F10-luc cells in the Material and Methods section (p.13). The rationale of choosing the B16-F10 cells for the pulmonary lung foci assay is also added on p.4.

      3) The DNA binding consensus site for EKLF needs to be expanded in the introduction.

      This part has been taken care of now on p.13.

      Reviewer #3 (Recommendations For The Authors):

      Hung et al provide a well-written manuscript focused on understanding how Eklf mutation confers anticancer and longevity advantages in vivo. The work is fundamental and the data is convincing although several details remain incompletely elucidated.

      1) Only melanoma is tested as a model of cancer such that a broad claim of "anti-cancer activity" may be somewhat of an overreach. The authors, therefore, need to provide evidence of a second type of malignancy to which Eklf mutation confers anticancer and longevity advantages or temper the claims in the discussion that the effect still needs to be tested in non-melanoma cancer models to determine the broad anti-cancer effect.

      As responded in the Public Reviews, we have since shown that Eklf(K74R) mice also exhibited a higher resistance to the carcinogenesis of hepatocellular carcinoma (new Fig. 1E).

      2) Why is a homozygous mutation needed when only a small fraction of cells during BMT can confer benefit of Eklf mutation? Is there evidence that the cellular effect is binary but only a few such cells are needed? This is confusing and requires further clarification.

      As responded in the Public Reviews, these two observations not necessarily conflict with each other. It is likely that homozygosity, but not heterozygosity, of the K74R substitution in EKLF allows one or more types of hematopoietic blood cells to gain new functions, e.g. the higher cancer cell- killing capability of NK(K74R) cells (new Fig. 6), that help the mice to live long and healthy. Also, the data in Fig. 2D indicated that as low as 20% of the blood cells carrying homozygous Eklf(K74R) alleles in the recipient mice upon BMT could be sufficient to confer the mice a higher anti-cancer capability, likely in part due to cells such as NK(K74R). This point is now clarified in Discussion (p.9).

      3) BMT typically requires at least 3-4 weeks to reconstitute the marrow compartment but the authors are able to see effects of Eklf mutation as early as 7 days following BMT. This is surprising and brings into question the mechanism of effect.

      As responded in the Public Reviews, we think the NK(K74R) cells contributed a significant part to the anti-cancer capability of the transplanted Eklf(K74R) blood in the recipient WT mice. As documented in some literature, e.g. Ferreira et al., Journal of Molecular Medicine (2019), the hematopoietic lineage of the NK cells would be fully reconstituted as early as 2 weeks after BMT. Of course, there could be other still unknown factors/ cells that also contribute to the tumor-resistance of the recipient mice at 7 day following BMT (please see discussion of this point on p. 9).

      4) It would be useful to see whether there are virulence marker alterations in the melanoma loci in WT vs Eklf (K74R) mice.

      As responded in the Public Reviews, we will analyze this in future together with other types of tumors in a separate study.

      5) The data in Fig 4c is difficult to interpret as decreased PD-1 and PDL-1 after knockdown of EKLF in vitro is not a useful experiment to corroborate how mutation WITHOUT changing EKLF expression impacts immune cells.

      Indeed, the RNAi knockdown experiment only demonstrated a positive regulatory role of EKLF in Pd1/Pd-l1 gene expression. We have followed the reviewer’s suggestion and carried out ChIP-qPCR analysis and shown that the factor is bound on the Pd-1 promoter in both WT CD3+T cells and CD3+T(K74R) cells (new Fig. S5). We briefly discuss these data on p.7 in relation to the possible effect of K74R substitution of EKLF on Pd-1 expression.

      We have now further clarified this point on p. 7.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on the very nice structure! In my opinion, which you can feel free to take or leave, this would work better as a short report focused on the improvement of the structure relative to the current published model. To my mind, while the functional and dimerization studies are supportive of the cryo-EM studies (specifically, the purified protein is functional, and does tend to dimerize in various membrane mimetics), these experiments don't provide a lot of new mechanistic insight on their own. The dimerization, in particular, could be developed further.

      Response: Thank you for the comments. We have chosen to stick with the current article format. That the protein is dimeric is exciting in our view and we are working to further define the functional significance of this formation.

      Reviewer #2 (Recommendations For The Authors):

      Ln 48. Abstract. "highlighting feature of the complex interface" sounds a bit vague. I was wondering if the authors considered including more specific findings here.

      Response: This sentence has been removed.

      Ln 149 and elsewhere. The authors refer to the previously published structure of HiSiaQM as "low resolution". It may just be me and likely not the intention of the authors, but this comes across as an attempt to diminish the validity of this previous work from another group, which is not necessary. I would recommend rewording these parts slightly, even if it is just to say "lower resolution" instead of "low resolution".

      Response: It was not our intention to diminish the excellent work published by another group, we have changed “low resolution” to “lower resolution” throughout.

      Ln 160. The authors state that the inward-open conformation is likely "the resting state of the transporter". I think this statement should be modified slightly to acknowledge that this is only true under these conditions, i.e. in the absence of the bilayer, membrane potential and chemical gradients.

      Response: We have edited this as follows “That we observe the inward-open conformation without either a bound P-subunit or fiducial marker, suggests that this is the resting state of the transporter under experimental conditions (in the absence of a membrane bilayer, membrane potential and chemical gradients).”

      Ln 202. I'm not convinced that the use of the word "probable" is appropriate here; "possible" would likely fit better in the absence of compelling evidence that this dimer forms in a bacterial cell membrane with physiological levels of HiSiaQM expression.

      Response: We have changed “probable” to “possible”.

      The authors show an SEC trace for DDM solubilised protein, which is a single peak, whereas the LMNG extracted protein has 2 distinctly different elution profiles depending on the LMNG concentration. Was the same phenomenon observed when varying the DDM concentration?

      Response: We observed significantly more aggregation with DDM than L-MNG, so it was infrequently used and considerably less well characterised. In one purification, moderately higher DDM shifted the elution peak to be slightly later but retained a similar profile. Overall, we did not observe the same phenomenon of distinctly different elution profiles with DDM, but we have limited data.

      Ln 245. The two positions cited as important for the elevator-type mechanism are the fusion helix and the dimer interface. However, there is no evidence that the dimer interface observed in this work has any relevance to the transport mechanism. To make this statement, the interface would need to be disrupted and the effects on transport evaluated.

      Response: This has been edited as follows. “Evident in our cryo-EM maps are well-defined phospholipid densities associated with areas of HiSiaQM that may be important for the function of an elevator-type mechanism (Figure 4), but require further testing.”

      Ln 257. The authors state that the lipids form "specific and strong interactions" with the protein, but without knowing the identity of the lipids present, it is difficult to say anything about the specificity of this interaction. I think the authors could consider rewording this. Response: We have edited this by removing the term “specific” and describing the lipid interactions only as strong interactions.

      Ln 270. The authors identify a lipid-binding site and residues that likely interact with the headgroup. It would be interesting if the authors could speculate on the purpose of this lipid binding site and how it could affect transport. The residues are not conserved, which the authors suggest reflects the variety of lipid compositions in different bacteria. Are the authors suggesting that this lipid binding site is a general feature for all fused TRAP transporters and that the identity of the lipid changes depending on the species?

      Response: Yes, we speculate that the lipid binding site may be a general feature for fused TRAP transporters. We have added speculation about this binding site, specifically that “the fusion helix and concomitant lipid molecule may provide a more structurally rigid scaffold than a Q-M heterodimer, i.e., PpSiaQM, although how this impacts the elevator transition requires further testing” at Line 283.

      Though we believe that a binding pocket is likely found in a number of fused TRAPs (based on sequence and Alphafold predictions, e.g., FnSiaQM and AaSiaQM), we have now acknowledged that some fusions may not necessarily bind a lipid molecule here, by stating “While this binding pocket is likely found in a number of fused TRAPs (based on sequence predictions, e.g., FnSiaQM and AaSiaQM in Supplementary Figure 8), it is not clear whether they also bind lipids here without experimental data” at Line 290.

      Ln 306. The authors state that the HiSiaPQM has a 10-fold higher transport activity than PpSiaPQM. Unless the transport assays were performed in parallel (to mitigate small changes in experimental set-up) and the reconstitution efficiency for each proteoliposome preparation was carefully analysed, it is very difficult for this to be a meaningful comparison. Even if the amount of protein incorporated into the proteoliposomes is quantified (e.g. by evaluating protein band intensity when the proteoliposomes are analysed using SDS-PAGE), this does not account for an inactive protein that was incorporated, nor the proportion of the protein that was incorporated in the inside-out orientation, which would be functionally silent in these assays. I'm not suggesting these assays actually need to be performed, but I think the text should be modified to reflect what can actually be compared.

      Response: We agree with the reviewer that a meaningful comparison is difficult to make without a careful analysis of the reconstitution efficiency and have modified the text to reflect this. We have altered the paragraph beginning at Line 319 to the following: “The fused HiSiaPQM system appears to have a higher transport activity than the non-fused PpSiaPQM system. With the same experimental setup used for PpSiaPQM (5 M Neu5Ac, 50 M SiaP) (33), the accumulation of [3H]-Neu5Ac by the fused HiSiaPQM is ~10-fold greater. Although this difference may reflect the reconstitution efficiency of each proteoliposome preparation, it is possible that it has evolved as a result of the origins of each transporter system—P. profundum is a deep-sea bacterium and as such the transporter is required to be functional at low temperatures and high pressures… ”

      Ln 335. "S298A did not show an effect on growth when mutated to alanine previously." Suggest changing "S298A" here to "S298".

      Response: This has been changed.

      Ln 340. In addition to PpSiaQM, the large cavity was also presumably observed in the lower resolution structure of HiSiaQM?

      Response: The cavity is detectable in the lower resolution structure (7qe5), though very poorly defined by the density. Furthermore, the AlphaFold model fitted to this density has positioned sidechains inside the cavity, which we consider very likely to be an error (in comparison to our structures, VcINDY and our estimates of the volume required to house sialic acid). The cavity is generally much better defined by the structures we have referenced.

      Ln 345. Reference missing after "previously reported"? Response: This has been added. Measuring the affinity for the P-to-QM interaction is very useful, but it would have enhanced the study if some of the residues identified as important for this interaction (detailed on p.13) had been tested for their contributions to binding using this approach.

      Response: We do aim to perform this assay with these mutants in the future, but are also developing parallel assays to further test this interaction in different membrane mimetics.

      Ln 436. As stated previously, it is more accurate to say that "this is the most stable conformation" under these conditions.

      Response: We have edited this to say “The ‘elevator down’ (inward-facing) conformation is preferred in experimental conditions”. We have also changed the last sentence of this paragraph to say “However, the dimeric structures we have presented have no other proteins bound, yet exist stably in the elevator down state, suggesting this is the most stable conformation in experimental conditions, where there is no membrane bilayer, membrane potential, or chemical gradient present.”

      Ln 438. "Lipids associated with HiSiaQM are structurally and mechanistically important." This conclusion is not supported by the data presented; there is no evidence that the bound lipids influence the mechanism at all. The lipids observed are certainly interestingly placed and one could speculate about their relevance, but this statement of fact is not supported. Therefore, their importance to the mechanism needs to be tested or this conclusion needs to be substantially softened.

      Response: We have softened this statement by changing it to “Lipids have strong interactions with HiSiaQM and are likely to be important for the transport mechanism.”

      Reviewer #3 (Recommendations For The Authors):

      The fact that HiSiaQM samples consist of a mixture of compact monomer and dimer is clear, from Fig. S5 and S6. However, the analysis displayed in Fig 3 and Fig S4 would require more explanation. To my understanding, it requires the values of the sedimentation and diffusion coefficients. It could be good to provide the experimental values of D, and explain a little more about the method in the material and method section.

      Response: Yes, the analysis requires the experimental diffusion coefficients. These have been added to the Figure 3 and S4 legends and more detail has been added to the method section.

      In addition, I am puzzled when reading, in the legend of Fig 3, considerations that peak 2 could not correspond to a monomer or trimer: do these sentences correspond to other mathematical solutions, or is a given frictional ratio considered, or do they refer to Fig. S5 analysis?

      We can see where this confusion could arise from. These sentences do not correspond to a given frictional ratio or the Fig. S5 analysis (this is a separate, complementary analysis). For peak 2 not existing as a monomer is strictly a physical justification – with pure protein and an observed peak smaller than peak 2, a monomer is not possible for peak 2. For peak 2 not existing as a trimer is a mathematical solution using the s and D coefficients. The solutions identify that an unreasonably low amount of detergent would be bound to a trimer (32 molecules for L-MNG or 0 for DDM) to exist at those s and D values so we have ruled the trimer out. Reassuringly, the complementary analysis in Fig. S5/S6 agrees with the monomer-dimer outputs from the s and D analysis. We have adjusted the text in the legends of Fig. 3 and S4 to better convey these points.

    1. Author Response

      eLife assessment

      This useful study uses a mouse model of pancreatic cancer to examine mitochondrial mass and structure in atrophying muscle along with aspects of mitochondrial metabolism in the same tissue. Most relevant are the solid transcriptomics and proteomics approaches to map out related changes in gene expression networks in muscle during cancer cachexia.

      Response: We very much appreciate the positive feedback from the editors on our article and are delighted to have it published in eLife. Our sincere thanks to the Reviewers for their positive feedback on our work, and for their insightful and constructive comments.

      Reviewer #1 (Public Review):

      Summary:

      This important study provides a comprehensive evaluation of skeletal muscle mitochondrial function and remodeling in a genetically engineered mouse model of pancreatic cancer cachexia. The study builds upon and extends previous findings that implicate mitochondrial defects in the pathophysiology of cancer cachexia. The authors demonstrate that while the total quantity of mitochondria from skeletal muscles of mice with pancreatic cancer cachexia is similar to controls, mitochondria were elongated with disorganized cristae, and had reduced oxidative capacity. The mitochondrial dysfunction was not associated with exercise-induced metabolic stress (insufficient ATP production), suggesting compensation by glycolysis or other metabolic pathways. However, mitochondrial dysfunction can lead to increased production of ROS/oxidative stress and would be expected to interfere with carbohydrate and lipid metabolism, events that are linked to cancer-induced muscle loss. The data are convincing and were collected and analyzed using state-of-the-art techniques, with unbiased proteomics and transcriptomics analyses supporting most of their conclusions.

      Additional Strengths:

      The authors utilize a genetically engineered mouse model of pancreatic cancer which recapitulates key aspects of human PDAC including the development of cachexia, making the model highly appropriate and translational.

      The authors perform transcriptomic and proteomics analyses on the same tissue, providing a comprehensive analysis of the transcriptional networks and protein networks changed in the context of PDAC cachexia.

      Weaknesses:

      The authors refer to skeletal muscle wasting induced by PDAC as sarcopenia. However, the term sarcopenia is typically reserved for the loss of skeletal muscle mass associated with aging.

      Response: We agree that the term sarcopenia initially refers to aged muscle, but its use has spread to other fields, including oncology (for example, in this article, which we quote: Mintziras I et al. Sarcopenia and sarcopenic obesity are significantly associated with poorer overall survival in patients with pancreatic cancer: Systematic review and meta-analysis. Int J Surg 2018;59:19-26). Actually, the term sarcopenia is now widely used in the literature and in the clinic to describe the loss of muscle mass and strength in cancer patients (see for example, this recent review: Papadopetraki A. et al. The Role of Exercise in Cancer-Related Sarcopenia and Sarcopenic Obesity. Cancers 2023;15;5856).

      In Figure 2, the MuRF1 IHC staining appears localized to the extracellular space surrounding blood vessels and myofibers-which causes concern as to the specificity of the antibody staining. MuRF1, as a muscle-specific E3 ubiquitin ligase that degrades myofibrillar proteins, would be expected to be expressed in the cytosol of muscle fibers.

      Response: We agree that MuRF1 IHC staining was also observed in the extracellular space, which was a surprise, for which we have no explanation to date.

      Disruptions to skeletal muscle metabolism in PDAC mice are predicted based on mitochondrial dysfunction and the transcriptomic and proteomics data. The manuscript could therefore be strengthened by additional measures looking at skeletal muscle metabolites, or linking the findings to previous work that has looked at the skeletal muscle metabolome in related models of PDAC cachexia (Neyroud et al., 2023).

      Response: We agree that our omics data could be strengthened by additional measures looking at skeletal muscle metabolites. It's an excellent suggestion to parallel the transcriptomic and proteomic data we obtained on the gastrocnemius muscle with the metabolomic data obtained by Neyroud et al. on the same muscle. These authors used another mouse model of PDAC than our KIC GEMM model, namely the allograft model implanting KPC cells (derived from the pancreatic tumor of KPC mice, another PDAC GEMM model) into syngeneic recipient mice. They carried out a proteomic study on the tibialis anterior muscle and a metabolomic study on the gastrocnemius muscle. Proteomics data identified in particular a KPC-induced reduction in the relative abundance of proteins annotating to oxidative phosphorylation, consistently with our data showing reduced mitochondrial activity pathways. Metabolomic data showed reduced abundance of many amino acids as expected, and of intermediates of the mitochondrial TCA cycle (malate and fumarate) in KPC-atrophied muscle consistently with reduced mitochondrial metabolic pathways that we illustrated. In contrast, metabolites that were increased in abundance included those related to oxidative stress and redox homeostasis, which is not surprising regarding the profound oxidative stress affecting atrophied muscle. Finally, we noted in Neyroud's metabolomic data the dysregulation of certain lipids and nucleotides in atrophied muscle, which is very interesting to relate to our study describing alterations in lipid and nucleotide metabolic pathways.

      Reviewer #2 (Public Review):

      The present work analyzed the mitochondrial function and bioenergetics in the context of cancer cachexia induced by pancreatic cancer (PDAC). The authors used the KIC transgenic mice that spontaneously develop PDAC within 9-11 weeks of age. They deeply characterize bioenergetics in living mice by magnetic resonance (MR) and mitochondrial function/morphology mainly by oxygraphy and imaging on ex vivo muscles. By MR they found that phosphocreatine resynthesis and maximal oxidative capacity were reduced in the gastrocnemius muscle of tumor-bearing mice during the recovery phase after 6 minutes of 1 Hz electrical stimulation while pH was reduced in muscle during the stimulation time. By oxygraphy, the authors showed a decrease in basal respiration, proton leak, and maximal respiration in tumor-bearing mice that was associated with the decrease of complex I, II, and IV activity, a reduction of OXPHOS proteins, mitochondrial mass, mtDNA, and to several morphological alterations of mitochondrial shape. The authors performed transcriptomic and proteomic analyses to get insights into mitochondrial defects in the muscles of PDAC mice. By IPA analyses on transcriptomics, they found an increase in the signature of protein degradation, atrophy, and glycolysis and a downregulation of muscle function. Focusing on mitochondria they showed a downregulation mainly in OXPHOS, TCA cycle, and mitochondrial dynamics genes and upregulation of glycolysis, ROS defense, mitophagy, and amino acid metabolism. IPA analysis on proteomics revealed major changes in muscle contraction and metabolic pathways related to lipids, protein, nucleotide, and DNA metabolism. Focusing on mitochondria, the protein changes mainly were related to OXPHOS, TCA cycle, translation, and amino acid metabolism.

      The major strength of the paper is the bioenergetics and mitochondrial characterization associated with the transcriptomic and proteomic analyses in PDAC mice that confirmed some published data of mitochondrial dysfunction but underlined some novel metabolic insights such as nucleotide metabolism.

      There are minor weaknesses related to some analyses on mitochondrial proteins and to the fact that proteomic and transcriptomic comparison may be problematic in catabolic conditions because some gene expression is required to maintain or re-establish enzymes/proteins that are destroyed by the proteolytic systems (including the autophagy proteins and ubiquitin ligases). The authors should consider the following points.

      Point 1. The authors used the name sarcopenia as synonymous with muscle atrophy. However, sarcopenia clearly defines the disease state (disease code: ICD-10-CM (M62.84)) of excessive muscle loss and force drop during ageing (Ref: Anker SD et al. J Cachexia Sarcopenia Muscle 2016 Dec;7(5):512-514.). Therefore, the word sarcopenia must be used only when pathological age-related muscle loss is the subject of study. Sarcopenia can be present in cancer patients who also experience cachexia, however since the age of tumor-bearing mice in this study is 7-9 weeks old, the authors should refrain from using sarcopenia and instead replace it with the words muscle atrophy/ muscle wasting/muscle loss.

      Response: This issue has also been raised by the Reviewer #1. We agree that the term sarcopenia historically refers to aged muscle, but it is also used in oncology (for example, in this article, which we quote: Mintziras I et al. Sarcopenia and sarcopenic obesity are significantly associated with poorer overall survival in patients with pancreatic cancer: Systematic review and meta-analysis. Int J Surg 2018;59:19-26). Actually, the term sarcopenia is now widely used in the literature and in the clinic to describe the loss of muscle mass and strength in cancer patients (see for example, this recent review: Papadopetraki A. et al. The Role of Exercise in Cancer-Related Sarcopenia and Sarcopenic Obesity. Cancers 2023;15;5856).

      Point 2. Most of the analyses of mitochondrial function are appropriate. However, the methodological approach to determining mitochondrial fusion and fission machinery shown in Fig. 5F is wrong. The correct way is to normalize the OPA1, MFn1/2 on mitochondrial proteins such as VDAC/porin. In fact, by loading the same amount of total protein (see actin in panel 5F) the difference between a normal and a muscle with enhanced protein breakdown is lost. In fact, we should expect a decrease in actin level in tumor-bearing mice with muscle atrophy while the blots clearly show the same level due to the normalization of protein content. Moreover, by loading the same amount of proteins in the gel, the atrophying muscle lysates become enriched in the proteins/organelles that are less affected by the proteolysis resulting in an artefactual increase. The correct way should be to lyse the whole muscle of control and tumor-bearing mice in an identical volume and to load in western blot the same volume between control cachectic muscles. Alternatively, the relative abundance of mitochondrial shaping proteins related to mitochondrial transmembrane or matrix proteins (mito mass) should compensate for the loading normalization. Because the authors showed elongated mitochondria despite mitophagy genes being up, fragmentation may be altered. Moreover, DNM1l gene is suppressed and therefore DRP1 protein must be analyzed. Finally, OPA 1 protein has different isoforms due to the action of proteases like OMA1, and YME1L that elicit different functions being the long one pro-fusion while the short ones do not. The authors must quantify the long and short isoforms of OPA1.

      Response: We acknowledge that our analysis of a minor set of proteins involved in mitochondrial dynamics by Western blotting (Figure 5F) is basic and could have been improved. We thank the Reviewer for all the suggestions, which will be very useful in future projects studying the subject in greater depth and according to the molecular characteristics of each player in mitochondrial fusion, fission, mitophagy and biogenesis.

      Point 3. The comparison of proteomic and transcriptomic profiles to identify concordance or not is problematic when atrophy programs are induced. In fact, most of the transcriptional-dependent upregulation is to preserve/maintain/reestablish enzymes that are consumed during enhanced protein breakdown. For instance, the ubiquitin ligases when activated undergo autoubiquitination and proteasome degradation. The same happens for several autophagy-related genes belonging to the conjugation system (LC3, Gabarap), the cargo recognition pathways (e.g. Ubiquitin, p62/SQSTM1) and the selective autophagy system (e.g. BNIP3, PINK/PARKIN) and metabolic enzymes (e.g. GAPDH, lipin). Finally, in case identical amounts of proteins have been loaded in mass spec the issues rise in point 2 of selective enrichment should be considered. Therefore, when comparing proteomic and transcriptomic these issues should be considered in discussion.

      Response: We fully agree with the Reviewer that seeking concordance between transcriptomic and proteomic data in the case of an organ affected by a high level of proteolysis is a difficult business. Another major difficulty we discussed in the Discussion section of the article is the fact that there is no concordance between RNA and protein level for a good proportion of proteins, for multiple reasons, so each level of omics has to be interpreted independently to give information on the pathophysiology of the organ studied.

    1. Author Response

      We thank the editors and reviewers for taking the time to provide a critical assessment of our manuscript. We are delighted our work was found to have merit, and will revise the manuscript based on their valuable input.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors):

      Major comments:

      1) The immunolabeling data in Figure S4 shows no change in puncta number but reduced puncta size in Kit KO. sIPSC data show reduced frequency but little change in amplitude. These data would seem contradictory in that one suggests reduced synaptic strength, but not number, and the other suggests reduced synapse number, but not strength. How do the authors reconcile these results?

      Regarding the synaptic puncta, In Kit KO (or KL KO), we have not detected an overt reduction in the average VGAT/Gephyrin/Calbindin positive puncta density or puncta size per animal. With respect to puncta size, only in the Kit KO condition, and only when individual puncta are assessed does this modest (~10%) difference in size become statistically significant. In the revision, we eliminate this figure and focus on the per animal averages.

      We interpret that the reduction in sIPSC and mIPSC frequency likely stems from a decreased proportion of functional synapse sites. The number of MLIs, their action potential generation, the density of synaptic puncta, and the ability of direct stimulation to evoke release and equivalent postsynaptic currents, are all similar in Control vs Kit KO. It is therefore feasible that a reduced frequency of postsynaptic inhibitory events is due to a reduced ability of MLI action potentials to invade the axon terminal, and/or an impaired ability for depolarization to drive (e.g. coordinated calcium flux) transmitter release. That is, while the number of MLIs and their synapses appear similar, the reduced mIPSC frequency suggests that there is a reduced proportion of, or probability that, Kit KO synapse sites that function properly.

      2) Related to point 1, it would be helpful to see immunolabeling data from Kit ligand KO mice? Do these show the same pattern of reduced puncta size but no change in number?

      Although we have not added a figure, we have now added experiments and a corresponding analysis in the manuscript. As we had previously for Kit KO, we now for KL KO conducted IHC for VGAT, Gephyrin, and Calbindin, and we analyzed triple-positive synaptic puncta in the molecular layer of Pcp2 Cre KL KO mice and Control (Pcp2 Cre negative, KL floxed homozygous) mice. We did not find a gross reduction in the average synaptic puncta size or density, or in the PSD-95 pinceau size. From this initial analysis, it appears that the presynaptic hypotrophy is more notable in the receptor than in the ligand knockout. We speculate that this is perhaps because the Kit receptor may have basal activity in the absence of Kit ligand, that Kit may serve a presynaptic scaffolding role that is lost in the receptor (but not the ligand) knockout, or simply that the embryonic timing of the Pax2 Cre vs Pcp2 Cre recombination events is more relevant to pinceaux development, especially as basket cells are born primarily prenatally.

      3) The data using KL overexpression in PC (figure 4E,F) are intriguing, but puzzling. The reduction in sIPSC frequency and amplitude in the control PC is much greater than seen in the Kit or KL KO. The interpretation of these data, "Thus, KL-Kit levels may not set the number of MLI:PC release sites, but may instead influence the proportion of synapses that are functional for neurotransmission (Figure 4G)" is not clear and the reasoning here should be explained in more detail, perhaps in the discussion.

      We have attempted to clarify this portion of the manuscript by eliminating the cartoon of the proposed model, and by revising and adding to the discussion. Either MLI Kit KO or PC KL KO seems to preserve the absolute number of MLI:PC anatomical synapse sites (IHC) but to reduce the proportion of those synapse that are contributing to neurotransmission (mIPSC). We speculate that sparse PC KL overexpression (OX) may either 1) weaken inhibition to surrounding control PCs by either diminishing KL OX PC to KL Control PC inhibition, and/or 2) act retrogradely through MLI Kit to potentiate MLI:MLI inhibition, reducing the MLI:PC inhibition at neighboring Control PCs.

      Minor comments:

      1) In the first sentence of the results, should "Figure 1A, B" be "Figure C, D"?

      Yes, corrected.

      2) The top of page 6 states "the mean mIPSC amplitude was ~10% greater in PC KL KO than in control", this does not appear to be the case in Figure 3E. control and KL KO look very similar here.

      In this portion of the text citing the modest 10% increase in mIPSC amplitude, we are referring to the average amplitude of all individual mIPSC events in the PC KL KO condition; in the figure referred to by the reviewer (3E), we are instead referring to the average of all mIPSC event amplitudes per KL KO PC. Because of the dramatic difference in sample size for individual events vs cells, this modest difference rises to statistical, if not biological, significance. We include this individual event analysis only to suggest that, since we in fact saw a slightly higher event amplitude in the KL KO condition, it is unlikely that a reduced amplitude would have been a technical reason that we detected a lower event frequency.

      3) Figure 3 D, duration, y-axis should be labelled "ms"

      Event duration is no longer graphed or referenced. This has been replaced with total inhibitory charge.

      Reviewer #2 (Recommendations For The Authors):

      Methods:

      • Pax2-Cre line: embryonal Cre lines sometimes suffer from germline recombination. Was this evaluated, and if yes, how?

      The global loss of Kit signaling is incompatible with life, as seen from perinatal lethality in other Kit Ligand or Kit mutant mouse lines or other conditional approaches. Furthermore, a loss of Kit signaling in germ cells impedes fertility. Thus, while not explicitly ruled out, since conditional Pax2 Cre mediated Kit KO animals were born, survived, and produced offspring in normal ratios, we do not suspect that germline recombination was a major issue in this specific study.

      • Include rationale for using different virus types in different studies (AAV vs. Lenti).

      This rationale is now included and reflects the intention to achieve infection sparsity in the smaller and less dense tissue of perinatal mouse brains.

      • How, if at all, was blinding performed for histological and electrophysiological experiments?

      It was not possible for electrophysiology to be conducted blinded for the Kit KO experiments, owing to the subjects’ hypopigmentation. However, whenever feasible, resultant microscopy images or electrophysiological data sets were analyzed by Transnetyx Animal ID, and the genotypes unmasked after analysis.

      • Provide justification for limiting electrophysiology recordings to lobule IV/V and why MLIs in the middle third of the molecular layer were prioritized when inhibition of PCs is dominated by large IPSCs from basket cells. Why were 2 different internals used for recording IPSCs and EPSCs in PCs and MLIs? While that choice is justified for action potential recordings, it provides poor voltage control in PC voltage clamp. Both IPSCs and EPSCs could have been isolated pharmacologically using a CsCl internal.

      The rationale for regional focus has been added to the text. For MLI action potential recordings, we opted to sample the middle third of the molecular layer so that we would not be completely biased to either classic distal stellate vs proximal basket subtypes. It is our hope, in future optogenetic interrogations, to simultaneously record the dynamics of all MLI subtypes in a more unbiased way. With respect to internal solutions, we initially utilized a cesium chloride internal to maximize our ability to resolve differences in GABAA mediated currents, which was the hypothesis-driven focus of our study. While we agree that utilizing a single internal and changing the voltage clamp to arrive at per-cell analysis of Excitatory/Inhibitory input would have been most informative, our decision to utilize pharmacological methods was driven by our experience that achieving adequate voltage clamp across large Purkinje cells was often problematic, particularly in adult animals.

      Introduction:

      In the introduction, the authors state that inactivating Kit contributes to neurological dysfunction - their examples highlight neurological, psychiatric, and neurodevelopmental conditions.

      The language has been changed.

      General:

      Using violin plots illustrates the data distribution better than bar graphs/SEM.

      We have included violin plots throughout, and we have changed p values to numeric values, both in the interest of presenting the totality of the data more clearly.

      Synapses 'onto' PCs sounds more common than 'upon' PCs.

      We have changed the wording throughout.

      Figure 1:

      1F - there seems to be an antero-posterior gradient of Kit expression.

      Though not explicitly pursued in the manuscript, it is possible that such a gradient may reflect differences in the timing of the genesis and maturation of the cerebellum along the AP axis. Regional variability is however now briefly addressed as a motivator for focused studies within lobules IV/V.

      E doesn't show male/female ratios but only hypopigmentation.

      This language has been corrected.

      Figure 2 and associated supplementary figures:

      2A/B: The frequency of sIPSCs is very high in PCs, making the detection of single events challenging. How was this accomplished? Please add strategy to the methods.

      We have added methodological detail for electrophysiology analysis.

      How were multi-peak events detected and analyzed? 'Duration' is not specified - do the authors refer to kinetics? If so, report rise and decay. It is likely impossible to show individual aligned sIPSCs with averages superimposed, given that sIPSCs strongly overlap. Alternatively, since no clear baseline can be determined in between events, and therefore frequency, amplitude, and kinetics quantification is near-impossible, consider plotting inhibitory charge.

      Given the heterogeneity of events, we now do not refer to individual event kinetics. As suggested, we have now included an analysis of the total inhibitory charge transferred by all events during the recording epoch.

      S2: Specify how density, distribution, and ML thickness were determined in methods. How many animals/cells/lobules?

      For consistency with viral injections and electrophysiology, the immunohistochemical analysis was restricted to lobule IV/V. This is clearer in the revision and detail is added in the methods.

      S3:

      S3B: the labels of Capacitance and Input resistance are switched.

      This has been corrected.

      How were these parameters determined? Add to methods.

      Added

      In the previous figure the authors refer to 'frequency', in this figure to 'rate' - make consistent

      This has been corrected.

      D: example does not seem representative. Add amplitude of current pulse underneath traces.

      We added new traces from nearer the group means and we now include the current trace.

      F/G example traces (aligned individual events + average) are necessary.

      We added example traces near the relevant group means for each condition.

      Statement based on evoked IPCSs that 'synapses function normally' is a bit sweeping and can only be fully justified with paired recordings. Closer to the data would be the release probability of individual synapses is similar between control and Kit KO.

      Paired recordings in both Kit Ligand and Kit receptor conditional knockout conditions is indeed an informative aim of future studies should support permit. For now, we have clarified the language to be more in line with the reviewer’s welcome suggestion.

      S4:

      Histological strategy cannot unambiguously distinguish MLI-PC and PC-PC synapses. Consider adding this confound to the text.

      We have added this confound to the discussion.

      The observation that the pinceau is decreased in size could have important implications for ephaptic coupling of MLI and PC and could be mentioned.

      We agree and have added this notion to the discussion.

      Y-label is missing in B.

      Corrected.

      Figure 3 and associated supplementary figures:

      In the text, change PC-Cre to L7-Cre or Pcp2-Cre.

      Changed

      How do the authors explain a reduction in frequency, amplitude, and duration of sIPSCs in the KL KO but not in the Kit KO? Add to the discussion

      We now address this apparent discordance in the discussion. Pax2 Cre mediates recombination weeks ahead of Pcp2 Cre. We therefore suspect that postnatal PC KL KO may be more phenotypic than embryonic MLI Kit KO because there is less time for developmental compensation. A future evaluation of the impact of postnatal Kit KO would be informative to this end.

      As in Figure 2, plotting the charge might be more accurate.

      We now plot total charge transfer.

      Are the intrinsic properties in KL KO PCs altered? (Spontaneous firing, capacitance, input resistance).

      We have added to the text that we found no difference in capacitance or input resistance between Purkinje cells from KL floxed homozygous Control animals versus those from KL floxed homozygous, PCP2 Cre positive KL KO animals. We plan to characterize both basal and MLI modulated PC firing in a future manuscript, especially since Pcp2 Cre mediated KL KO seems more phenotypic than Pax2 Cre mediated Kit KO, we agree that this seems a better testbed for investigating differences in both the basal, and the MLI-mediated modulations in, PC firing.

      3D-F - Example traces would be desirable (see above, analogous to Fig. 2).

      More example traces have been added.

      Figure 4: 'In vivo mixtures' sounds unusual. Consider revision (e.g., 'to sparsely delete KL').

      Changed

      The observation that control PC sIPSC frequency is lower in KL OX PCs than in sham is interesting. This observation would be consistent with overall inhibitory synapse density being preserved. This could be evaluated with immunohistochemistry. For how far away from the injection area does this observation hold true?

      Because we have now analyzed and failed to find an overt (per animal average) change in synaptic puncta size or density in the whole animal Control vs PCP2 Cre mediated KL KO conditions, we do not have confidence that it is feasible to pursue this IHC strategy in the sparse viral-mediated KL KO or OX conditions. To the reviewer’s valid point however, we intend to probe the spatial extent/specificity of the sparse phenomenon when we are resourced to complement the KL/Kit manipulations with transgenic methods for evaluating MLI-PC synapses specifically, potentially by GRASP or related methods that would not be confounded by PC-PC synapses. Transgenic MLI access would also facilitate determining the spatial extent to which opto-genetically activated MLIs evoke equivalent responses in Control vs KL manipulated PCs.

      Y-legend in D clipped.

      Corrected

      Existing literature suggests that MLI inhibition regulates the regularity of PC firing - this could be tested in Kit and KL mutants.

      For now, based upon transgenic animal availability, we have now included an evaluation of PC firing in the (Pax2 Cre mediated) Kit KO condition. PC average firing frequency, mean ISI, and ISI CV2 were not significantly different across genotypes. A KS test of individual ISI durations for Control vs Kit KO did reveal a difference (p<0.0001). We have added a supplementary figure (S6) with this data. It is possible that in the more phenotypic PC KL KO condition that we may find a difference in these PC spiking patterns of PC firing, however, we are also eager to test in future studies whether postnatal KL or Kit KO impairs the ability of MLI activation to produce pauses or other alterations in PC firing or in PF-PC mediated plasticity.

      Reviewer #3 (Recommendations For The Authors):

      Reference to Figure 1A in the Results section is slightly inaccurate. Kit gene modifications are illustrated in Figures 1A, B. Where Figure 1A shows Kit distribution. Please rephrase. Relatedly, the reference to Figs 1B - D are shifted in the results section, and 1E is skipped.

      We have changed the text.

      Please show cumulative histograms for frequency too for consistency with amplitude (e.g. Fig 2).

      We have instead, for reasons outlined by other reviewers, documented total charge transfer for both Kit KO and KL KO experiments where sIPSC events were analyzed.

      Fig S3: include example traces of PPR.

      This is now included.

      Include quantifications of GABAergic synapse density in Fig S4.

      This is now included.

      Include inset examples of KO in Fig S4A.

      This is now included.

      Add average puncta size graphs along Figure S4B. The effect apparent in the histogram of S4B is small and statistics using individual puncta as n values (in the 20,000s) therefore misleading.

      Per animal analysis is now instead included in the figure and text.

      Figure S4B y axis label blocked.

      Corrected

      Include quantification referenced in "As PSD95 immunoreactivity faithfully follows multiple markers of pinceaux size 40, we quantified PSD95 immunoreactive pinceau area and determined that pinceaux area was decreased by ~50% in Kit KO (n 26 Control vs 43 Kit KO, p<0.0001, two-tailed t-test)."

      We added a graph of per animal averages, instead of in text individual pinceau areas.

      Include antibody dilutions in the methods.

      Added.

      It's unclear from the text where the Mirow lab code comes from.

      Detail has now been added in text.

      Typo in methods "The Kit tm1c alle was bred...".

      Corrected

      Typo in Figure S4 legend "POSD-95 immuno-reactivity".

      Corrected

    1. Author Response

      The following is the authors’ response to the original reviews.

      First of all, we'd like to thank the three reviewers for their meticulous work that enable us to present now an improved manuscript and substantial changes were made to the article following reviewers' and editors' recommendations. We read all their comments and suggestions very carefully. Apart from a few misunderstandings, all comments were very pertinent. We responded positively to almost all the comments and suggestions, and as a result, we have made extensive changes to the document and the figures. This manuscript now contains 16 principal figures and 15 figure supplements.

      The number of principal figures is now 16 (1 new figure), and additional panels have been added to certain figures. On the other hand, we have added 7 additional figures (supplement figures) to answer the reviewers' questions and/or comments.

      Main figures

      ▪ Figures 1, 4, 5, 10, 11, 12, 13, 14: unchanged ▪ Figure 7 and 8 were switched.

      ▪ Figure 2: we added panel F in response to reviewer 3's and request for sperm defect statistics

      ▪ Figure 3: the contrast in panel B has been taken over to homogenize colors

      ▪ Figure 6: This figure was recomposed. The WB on testicular extract was suppressed and we present a new WB allowing to compare the presence of CCDC146 in the flagella fraction. Using an anti-HA Ab, we demonstrate that the protein is localized in the flagella in epididymal sperm. Request of the 3 reviewers.

      ▪ Figure 7 (old 8): to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum. Moreover, the WB was removed and is now presented in figure 6 (improved as requested).

      ▪ Figure 8. Was old figure 7

      ▪ Figure 9: figure 9 was recomposed and improved for increased clarity as suggested by reviewer 2 and 3.

      ▪ Figure 16 was before appendix 11

      Figure supplements and supplementary files

      ▪ Figure 1-Figure supplement 1 New. Sperm parameters of the 2 patients. requested by editor (remark #1) by the reviewer 1 (Note #3)

      ▪ Figure 2-Figure supplement 1 new. Sperm parameters of the line 2 (KO animals) requested by the reviewer 1 (Note #5)

      ▪ Figure 4-Figure supplement 1 New. Experiment to evaluate the specificity of the human CCDC146 antibody. Minimal revision request and reviewer 1 note #8

      ▪ Figure 6-Figure supplement 1 New. Figure recomposed; Asked by reviewer 2 note #4 and reviewer 3

      ▪ Figure 8-Figure supplement 1 New. We now provide new images to show the non-specific staining of the midpiece of human sperm by secondary Abs in ExM experiments; Asked by reviewer 2

      ▪ Figure 10-Figure supplement 1 New. We added new images to show the non-specific staining of the midpiece of mouse sperm by secondary Abs in IF (panel B). Rewiever 1 note #9 and reviewer 2 note #5

      ▪ Figure 12-Figure supplement 1 New. Control requested by reviewer 3 Note #23

      ▪ Figure 13-Figure supplement 1 New. We provide a graph and a statistical analysis demonstrating the increase of the length of the manchette in the Ccdc146 KO. Requested by editor and reviewer 3 Note 24

      ▪ Figure 15-Figure supplement 1 New. Control requested by reviewer 2. Minor comments

      ▪ Figure supplementary 1 New. Answer to question requested by reviewer 2 note #1

      All the reviewers' and editors’ comments have been answered (see our point to point response) and we resubmit what we believe to be a significantly improved manuscript. We strongly hope that we meet all your expectations and that our manuscript will be suitable for publication in "eLife". We look forward to your feedback,

      Point by point answer

      Please note that there has been active discussion of the manuscript and the summarize points below is the minimal revision request that the reviewers think the authors should address even under this new review model system. It was the reviewers' consensus that the manuscript is prepared with a lot of oversights - please see all the minor points to improve your manuscript.

      All minimal revision requests have been addressed

      Minimal revision request

      1) Clinical report/evaluation of the two patients should be given as it was not described even in their previous study as well as full description of CCDC146.

      We provide now a new Figure 1-figure supplement 1 describing the patients sperm parameters

      2) Antibody specificity should be provided, especially given two of the reviewers were not convinced that the mid piece signal is non-specific as the authors claim. As both KO and KI model in their hands, this should be straightforward.

      To validate the specificity of the Antibody, we transfected HEK cells with a human DDK-tagged CCDC146 plasmid and performed a double immunostaining with a DDK antibody and the CCDC146 antibody. We show that both staining are superimposable, strongly suggesting that the CCDC146 Ab specifically target CCDC146. This experiment is now presented in Figure 4-Figure supplement 1. Next, to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum.

      3) The authors should improve statistical analysis to support their experimental results for the reader can make fair assessment. Combined with clear demonstration of ab specificity, this lack of statistical analysis with very few sample number is a major driver of dampening enthusiasm towards the current study.

      Several statistical analyses were carried out and are now included:

      1) distribution of the HA signal in mouse sperm cells (see point 2 Figure 7 panel B)

      2) quantification and statistical analyses of the defect observed in Ccdc146 KO sperm (figure 2 panel E)

      3) Quantification and statistical analyses of the length of the manchette in spermatids 13-15 steps (Figure 13-Figure supplement 1 new)

      4) The authors need to clarify (peri-centriolar vs. centriole)

      In figure 4A, we have clearly shown that the protein colocalizes with centrin, a centriolar core protein in somatic cells. This colocalization strongly suggests that CCDC146 is therefore a centriolar protein, and this is now clearly indicated lines 211-212. However, its localization is not restricted to the centrioles and a clear staining was also observed in the pericentriolar material (PCM). The presence of a protein in PCM and centriole was already described, and the best example is maybe gamma-tubulin (PMID: 8749391).

      or tone down (CCDC146 to be a MIP) of their claim/description.

      Concerning its localization in sperm, we agree with the reviewer that our demonstration that CCDC146 is MIP would deserve more results. Because of that, we have toned down the MIP hypothesis throughout the manuscript. See lines 491495

      Testis-specific expression of CCDC146 as it is not consistent with their data.

      We have also modified our claim concerning the testis-expression of CCDC146. Line 176

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      1) As described in general comments, this study limits how the CCDC146 deficiency impairs abnormal centriole and manchette formation. The authors should explain their relationship in developing germ cells.

      In fact, there are limited information about the relationship between the manchette and the centriole. However, few articles have highlighted that both organelles share molecular components. For instance, WDR62 is required for centriole duplication in spermatogenesis and manchette removal in spermiogenesis (Commun Biol. 2021; 4: 645. doi: 10.1038/s42003-021-02171-5). Another study demonstrates that CCDC42 localizes to the manchette, the connecting piece and the tail (Front. Cell Dev. Biol. 2019 https://doi.org/10.3389/fcell.2019.00151). These articles underline that centrosomal proteins are involved in manchette formation and removal during spermiogenesis and support our results showing the impact of CCDC146 lack on centriole and manchette biogenesis. This information is now discussed. See lines 596-603

      2) The authors generated knock-in mouse model. If then, are the transgene can rescue the MMAF phenotype in CCDC146-null mice? This reviewer strongly suggest to test this part to clearly support the pathogenicity by CCDC146.

      We indeed wrote that we created a “transgenic mice”, which was misleading. We actually created a CCDC16 knock-in expressing a tagged-protein. The strain was actually made by CRISPR-Cas9 and a sequence coding for the HA-tag was inserted just before the first amino acid in exon 2, leading to the translation of an endogenous HA-tagged CCDC146 protein. We have removed the word transgenic from the text and made changes accordingly (see lines 250-253). We can therefore not use this strain to rescue the MMAF phenotype as suggested by the reviewer.

      3) Although the authors cite the previous study (Coutton et al., 2019), the study does not describe any information for CCDC146 and clinical information for the patients. The authors must show the results for clinical analysis to clarify the attended patients are MMAF patients without other phenotypic defects.

      We have now inserted a table, indicating all sperm parameters for the patients harboring a mutation in the CCDC146 gene (Figure 1-Figure supplement 1) and is now indicated lines 159-160

      4) The authors describe CCDC146 expression is dominant in testes, However, the level in testis is only moderate in human (Supp Figure 1). Thus, this description is not suitable.

      In Figure 1-figure supplement 2 (old FigS1), the median of expression in testis is around 12 in human, a value considered as high expression by the analysis software from Genevestigator. However, for mouse, it is true that the level of expression is medium. We assumed that reviewer’s comment concerned testis expression in mouse. To take into account this remark, we changed the text accordingly. See line 176.

      5) Although the authors mentioned that two mice lines are generated, only one line information is provided. Authors must include information for another line and provide basic characterization results to support the shared phenotype within the lines.

      We now provide a revised Figure 2-figure supplement 1CD, presenting the second line and the corresponding text in the main text is found lines 178-183.

      6) In somatic cells, the CCDC146 localizes at both peri-centriole and microtubule but its intracellular localization in sperm is distinguished. The authors should explain this discrepancy.

      The multi-localization of a centriolar protein is already discussed in detail in discussion lines 520-526. We have written:

      “Despite its broad cellular distribution, the association of CCDC146 with tubulin-dependent structures is remarkable. However, centrosomal and axonemal localizations in somatic and germ cells, respectively, have also been reported for CFAP58 [37, 55], thus the re-use of centrosomal proteins in the sperm flagellar axoneme is not unheard of. In addition, 80% of all proteins identified as centrosomal are found in multiple localizations (https://www.proteinatlas.org/humanproteome/subcellular/centrosome). The ability of a protein to home to several locations depending on its cellular environment has been widely described, in particular for MAP. The different localizations are linked to the presence of distinct binding sites on the protein…. “

      7) Authors mention CCDC146 is a centriolar protein in the title and results subtitle. However, the description in results part depicts CCDC146 is a peri-centriolar protein, which makes confusion. Do the authors claim CCDC146 is centrosomal protein?

      In figure 4A, we have clearly shown that the protein colocalizes with centrin, a centriolar core protein. This colocalization strongly suggests that CCDC146 is therefore a centriolar protein in somatic cells, and is now clearly indicated lines 211-212. However, its localization is not restricted to the centrioles and a clear staining was also observed in the pericentriolar material (PCM). The presence of a protein in PCM and centriole was already described and the best example is maybe gamma-tubulin (PMID: 8749391).

      8) Verification of the antibody against CCDC146 must be performed and shown to support the observed signal are correct. 2nd antibody only signal is not proper negative control.

      It is a very important remark. The commercial antibody raised against human CCDC146 was validated in HEK293-cells expressing a DDK-tagged CCDC146 protein. Cells were co-marked with anti-DDK and anti-CCDC146 antibodies. We have a perfect colocalization of the staining. This experiment is now presented in Figure 4-figure supplement 1 and presented in the text (lines 206-208).

      9) In human sperm, conventional immunostaining reveals CCDC146 is detected from acrosome head and midpiece. However, in ExM, the signal at acrosome is not detected. How is this discrepancy explained? The major concern for the ExM could be physical (dimension) and biochemical (properties) distortion of the sample. Without clear positive and negative control, current conclusion is not clearly understood. Furthermore, it is unclear why the authors conclude the midpiece signal is non-specific. The authors must provide experimental evidence.

      Staining on acrosome should always be taken with caution in sperm. Indeed, numerous glycosylated proteins are present at the surface of the plasma membrane regarding the outer acrosomal membrane for sperm attachment and are responsible for numerous nonspecific staining. Moreover, this acrosomal staining was not observed in mouse sperm, strongly suggesting that it is not specific.

      Concerning the staining in the midpiece observed in both conventional and Expansion microscopy, it also seems to be nonspecific and associated with secondary Abs.

      For IF, we now provide new images showing clearly the nonspecific staining of the midpiece when secondary Ab were used alone (see Figure 10-figure supplement 1B).

      For ExM, we provide new images in Figure 8-figure supplement 1B (POC5 staining) showing a staining of the midpiece (likely mitochondria), although POC5 was never described to be present in the midpiece. Both experiments (CCDC146 and POC5 staining by ExM) shared the same secondary Ab and the midpiece signal was likely due to it.

      Moreover, we now provide new images (figure 7C) in ExM on mouse sperm showing no staining in the midpiece and demonstrating that the punctuated signal is present all along the flagellum. Finally, we would like to underline that we now provide new IF results, using an anti-HA conjugated with alexafluor 488 and confirming the ExM results.

      These points are now discussed lines 498-502 for acrosome and lines 503-511 for midpiece staining.

      10) For intracellular localization of the CCDC146 in mouse sperm, the authors should provide clear negative control using WT sperm which do not carry the transgene.

      This experiment was performed.

      To avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum.

      11) Current imaging data do not clearly support the intracellular localization of the CCDC146. Although western blot imaging reveal that CCDC146 is detected from sperm flagella, this is crude approach. Thus, this reviewer highly recommends the authors provide more clear experimental evidence, such as immuno EM.

      We provide now a WB comparing the presence of the protein in the flagellum and in the head fractions; see new figure 6. We show that CCDC146 is only present in the flagellum fraction; The detection of the band appeared very quickly at visualization and became very strong after few minutes, demonstrating that the protein is abundant in the flagella. It is important to note that epididymal sperm do not have centrioles and therefore this signal is not a centriolar signal. We also now provide new statistical analyses showing that the immuno-staining observed in the principal piece is very specific (Figure 7B). Altogether, these results demonstrate unequivocally the intracellular localization of CCDC146 in the flagellum. This point is now discussed lines 480-489

      12) Although sarkosyl is known to dissociate tubulin, it is not well understood and accepted that the enhanced detection of CCDC146 by the detergent indicates its microtubule inner space. Sperm axoneme to carry microtubule is also wrapped peri-axonemal components with structural proteins, which are even not well solubilized by high concentration of the ionic detergent like SDS.

      We agree with the reviewer that the solubilization of the protein by sarkozyl is not a proof of the presence of the protein inside microtubule. Taking into account this point, the MIP hypothesis was toned down and we now discuss alternative hypothesis concerning these results; See discussion lines 490-497

      13) SEM image is not suitable to explain internal structure (line 317-323).

      We agree with the reviewers and changes were made accordingly. See lines 354-357

      Minor comments

      1) In main text, supplementary figures are cited "Supp Figure". And the corresponding legends are written in "Appendix - Figure". Please unify them.

      Done Labelled now “Figure X-figure supplement Y”

      2) Line 159, "exon 9/19" is not clear.

      We have written now exons 9 and indicated earlier that the gene contains 19 exons

      3) Line 188, "positive cells" are vague.

      Positive was changed by “fluorescent”

      4) Representative TUNEL assay image for knockout testes were not shown in Supp Figure 3B.

      It was a mistake now Figure 2-figure supplement 2C

      5) Please provide full description for "IF" and "AB" when described first.

      Done

      6) Line 262, It is unclear what is "main piece".

      Changed to principal piece

      7) Line 340, Although the "stage" information might be applicable, this is information for "seminiferous tubule" rather than "spermatid". This reviewer suggests to provide step information rather than stage information.

      We agree with the reviewer that there was a confusion between “stage” and “step”. We change to step spermatids

      8) Line 342, Step 1 is not correct in here.

      OK corrected. now steps 13-15 spermatids

      9) Line 803, "C." is duplicated.

      Removed

      10) Figure 3A, it will be good to mark the defective nuclei which are described in figure legends.

      These cells are now indicated by white arrow heads

      11) Figure 5, Please provide what MT stands for.

      Now explained in the legend of figure 5

      12) Figure 6. Author requires clear blot images for C. In addition, Panel B information is not correct. If the blot was performed using HA antibody, then how "WT" lane shows bands rather than "HA" bands?

      The reviewer is correct. It was a mistake; The figure was recomposed and improved.

      Reviewer #2 (Recommendations For The Authors):

      Overall, editing oversights are present throughout the manuscript, which has made the review process quite difficult. Some repetitive figures can be removed to streamline to grasp the overall story easier. Some claims are not fully supported by evidence that need to tone down. Some figures not referenced in the main text need to be mentioned at least once.

      All figures are now referenced in the text

      Major comments:

      1) 163-164 - Please clarify the claim that there is going to be an absence of the protein or nonfunctional protein, especially for the patient with a deletion that could generate a truncated protein at two third size of the full-length protein. Similarly, 35% of the protein level is present for the patient with a nonsense mutation. Some in silico structural analysis or analysis of conserved domains would be beneficial to support these claims.

      Both mutations are predicted to produce a premature stop codons: p.Arg362Ter and p.Arg704serfsTer7, leading either to the complete absence of the protein in case of non-sense mediated mRNA decay or to the production of a truncated protein missing almost two third or one fourth of the protein respectively. CCDC146 is very well conserved throughout evolution (Figure supplementary 1), including the 3’ end of the protein which contains a large coil-coil domain (Figure 1B). In view of the very high degree of conservation, it is most likely that the 3’ end of the protein, absent in both subjects, is critical for the CCDC146 function and hence that both mutations are deleterious. This explanation is now added to the discussion. see lines 439-448

      2) 173, 423 - Please clearly state a rationale of your mouse model design (i.e., why a mouse model that recapitulate human mutation is not generated) as the truncations identified in human patients are located further towards the C-terminus, and it is not clear whether truncated proteins are present, and if so, they could still be functional. Basically, the current mouse model supports the causality of the human mutations.

      This is an important question, which goes beyond the scope of this article, and raises the question of how to confirm the pathogenicity of mutations identified by high-throughput sequencing. The production of KO or KI animals is an important tool to help confirm one’ suspicions but the first element to take into consideration is the nature of the genetic data.

      Here we had two patients with homozygous truncating variants. In human, it is well established that the presence of premature stop codons usually induces non-sense mediated mRNA decay (NMD), inducing the complete absence of the protein or a strong reduction in protein production. In the unlikely absence of NMD in our two patients, the identified variants would induce the production of proteins missing 60% and 30% of their C terminal part. Often (and it is particularly true for structural proteins) the production of abnormal proteins is more deleterious than the complete absence of the protein (and it is most likely the purpose of NMD, to limit the production of abnormal “toxic” proteins). For these reasons, to try to recapitulate the most likely consequences of the human variants, without risking obtaining an even more severe effect, we decided to introduce a stop codon in the first exon in order to remove the totality of the protein in the KO mice.

      The second element is to interpret the phenotype of the KO animals. Here, the human sperm phenotype is perfectly recapitulated in the KO mice.

      Overall, we have strong genetic arguments in human and the reproduction of the phenotype in KO mice confirming the pathogenicity of the variants identified in men.

      This point is now discussed see lines 433-438

      3) Figure 6A - the labelling is misleading as it seems to suggest that the specific cells were isolated from the testes for RT-PCR.

      We have modified the labelling to avoid any confusion.

      Figure 6B -Signal of HA-tag is shown in WT, not in transgenic. Please check the order of the labels. Figure 6C - This blot is NOT a publication-quality figure. The bands are very difficult to observe, especially in lane D18. Because it is one of the important data of this study, replacing this figure is a must.

      The figure has been completely remade, including new results. See new figure 6. Figure 6C was suppressed.

      4) Supplementary fig 6 is also not a publication-level figure, and the top part seems largely unnecessary (already in the figure legend).

      The figure has been completely remade as well (now Figure 6-Figure Supplement 1).

      5) 261/267- The conclusion that mitochondrial staining in the flagellum (in both mice and humans) is non-specific is not convincing. Supplementary fig 8 shows that the signal from secondary only IF possibly extends beyond the midpiece - but it is hard to determine as no mitochondrial-specific staining is present. Either need to tone down the conclusion or provide supporting experimental evidence.

      First, to avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the flagellum. These experiments are now described lines 271-279

      Second, we provide new images of the signal obtained with secondary Abs only that shows more clearly that the secondary Ab gave a non-specific staining (Figure 10-Figure supplement 1B). This point is discussed lines 503-511

      6) Figure 9 A - Please relate the white line to Fig. 9B label in X-axis. The information from Fig 9A+D and 9E+F are redundant. The main text nor the figure legends indicate why these specific two sperm were chosen for quantification and demonstrating the outcomes. One of them could be moved to supplementary information or removed, or the two could be combined.

      As suggested by the reviewer, we have combined the two sperm to demonstrate that CCDC146 staining is mostly located on microtubule doublets. Moreover, the figure was recomposed to make it clearer.

      Minor comments:

      All of the supplementary figures are referred to as Supp Fig X in the text, however, they are actually titled Appendix - Figure X. This needs to be consistent.

      The figures are now referred as figure supplement x in both text and figures

      Line 125 - edit spacing.

      We think this issue (long internet link) will be curated later and more efficiently by the journal, during the step of formatting necessary for publication.

      144 - With which to study  with which we studied?

      We made the change as suggested.

      151 - Supp Fig 1 - the text says that the gene is highly transcribed in human and mouse testes, but the information in the figure states that the level in mouse tissues is "medium"

      We have corrected this mistake in the text; See line 176

      165 - The two mutations are most likely deleterious. Please specifically mention what analyses done to predict the deleterious nature to support these claims.

      Both variants, c.1084C>T and c.2112del, are extremely rare in the general population with a reported allele frequency of 6.5x10-5 and 6.5x10-06 respectively in gnomAD v3. Moreover, these variants are annotated with a high impact on the protein structure (MoBiDiC prioritization algorithm (MPA) score = 10, DOI: 10.1016/j.jmoldx.2018.03.009) and predicted to induce each a premature termination codon, p.(Arg362Ter) and p.(Arg704SerfsTer7) respectively, leading to the production of a truncated protein. This information is now given line 164-169

      196-200/Figure 4 - As serum starved cells/basal body (B) are not mentioned in the main text, as is, Fig 4A would be sufficient/is relevant to the text. Please make the text reflect the contents of the whole figure, or re/move to supplement.

      We agree with the reviewer that the full description of the figure should be in the text. We added two sentences to describe figure 4B see lines 217-218.

      224 - spermatozoa (plural) fits better here, not spermatozoon

      OK changed accordingly

      236 - According to the figure legend, 6B is only showing data from the epididymal sperm, not postnatal time points; should be referencing 6C. Alignment of Marker label

      As indicated above, the figure has been completely remade, including new results. See new figure 6. Figure 6C was suppressed. The corresponding text was changed accordingly see lines 249-266

      255-256 - Referenced figure 7B3, however, 7B3 only shows tubulin staining, so no CCDC146 can be observed. Did authors mean to reference fig 7B as a whole?

      Sorry for this mistake. We agree and the text is now figure 8B6 (figure 7 and 8 were switched)

      305 - "of tubules" - I presume it is meant to be microtubules?

      Yes; The text was changed as suggested

      317-321 - a diagram of HTCA would be useful here

      We have added a reference where HTCA diagram is available see line 363. Moreover, a TEM view of HTCA is presented figure 12A

      322/Fig 11A - an arrow denoting the damage might be useful, as A1 and A3 look similar. The size of the marker bar is missing. Please update the information on figure legend.

      Concerning, the comparison between A1 and A3, the take home message is that there is a great variability in the morphological damages. This point is now underlined in the corresponding text. We updated the size of the marker bar as suggested (200 nm). See line 365-367

      323 - Please mark where capitulum is in the figure

      Capitulum was changed for nucleus

      Since Fig 11B2 is not referenced in the main text, it does not seem to add anything to the data, and could be removed/moved to supplement.

      We added a sentence to describe figure 11B2 line 370

      342-343 - manchette in step I is not seen clearly - the figure needs to be annotated better. However, DPY19L2 is absent in step I in the KO, but the main text does not reflect that - why is that?

      We do not understand the remark of the reviewer “manchette in step I is not seen clearly”. The figure shows clearly the manchette (red signal) in both WT and KO (Figure 13 D1/D2).

      For steps 13-15 WT spermatids, the size of the manchette decreases and become undetectable. In KO spermatids, the shrinkage of the manchette is hampered and in contrast continue to expand (Figure 13D2). We also provide a new Figure 13-figure supplement 1 for other illustrations of very long manchettes and a statistical analysis. In the meantime, the acrosome is strongly remodeled, as shown in figure 16-new, with detached acrosome (panel H). This morphological defect may induce a loss of the DPY19L2 staining (Figure 13 D2 stage I-III). This explanation is now inserted in the text line 396399

      Figure 15B and 15C only show KO, corresponding images from the WT should be present for comparison.

      WT images are now provided in Figure 1-figure supplement 1 new

      Figure 12 - Figure 12 - JM?.

      JM was removed. It does not mean anything

      Figure 12C and Supplementary Fig 10 - structures need to be labelled, as it is unclear what is where

      Done

      338 - text mentions step III, but only sperm from step VII are shown in Figure 13

      As suggested by reviewer 3, we changed stage by step. The text was modified to take into account this remark see lines 388-396

      360 - This is likely supposed to say Supp Figure 11E-G, not 13??

      Yes, it is a mistake. Corrected

      388 Typo "in a in a".

      Yes, it is a mistake. Corrected

      820 - Fig 3 legend - in KO spermatid nuclei were elongated - could this be labelled by arrows? I am not convinced this phenotype is that different from the WT.

      In fact, the nuclei of elongating KO spermatids are elongated and also very thin, a shape not observed in the WT; We have added arrow heads and modified the text to indicate this point line 200.

      836 - Figure 5 legend says that in yellow is centrin, but that is not true for 5A, where the figure shows labelling for y-tubulin (presumably, according to the figure itself).

      We have modified the text of the legend to take into account the remark

      837- 5A supposedly corresponds to synchronized HEK293T cells, but the reasoning behind using synchronized cells is not mentioned at all in the main text; furthermore, how this synchronization is achieved is not explained in materials and methods (serum starvation? Thymidine block?).

      Yes, figure 5A was obtained with synchronized cells. We have added one paragraph in the MM section. For cell synchronization experiments, cells underwent S-phase blockade with thymidine (5 mM, SigmaAldrich) for 17 h followed by incubation in a control culture medium for 5 h, then a second blockade at the G2-M transition with nocodazole (200 nM, Sigma-Aldrich) for 12 h. Cells were then fixed with cold methanol at different times for IF labelling. See line 224 for changes made in the result section and lines 700-704 for changes made in the MM section.

      845- figure legend says that the RT-PCR was done on CCDC146-HA tagged mice, but the main text does not reflect that.

      We made changes and the description of the KI is now presented before (line 240) the RT-PCR experiment (line 257).

      949 - it is likely supposed to say A2, not B1 (B1 does not exist in Fig 15)

      Yes, it is a mistake. Corrected

      971 - Appendix Fig 3 legend - I believe that the description for B and C are swapped.

      Yes, it is a mistake. Corrected

      Furthermore, some questions to address in A would be: Which cross sections were from which animal/points? How many per animal? Were they always in the same location?

      Yes, we have a protocol for arranging and orienting all testes in the same way during the paraffin embedding phase. The cross-sections are therefore not taken at random, and we can compare sections from the same part of the testis. The number of animals was already indicated in the figure legend (see line 1128)

      Reviewer #3 (Recommendations For The Authors):

      1) There are a number of grammatical and orthographical errors in the text. Careful proofreading should be performed.

      We have sent the manuscript to a professional proofreader

      2) The author should also check for redundancies between the introduction and the discussion.

      The discussion has modified to take into account reviewers’ remarks. Nevertheless, we did our best to avoid redundancies between introduction and discussion.

      3) Can the authors provide a rationale why they have chosen to tag their gene with an HA tag for localisation? One would rather think of fluorescent proteins or a Halo tag.

      Because the functional domains of the protein are unknown, adding a fluorescent protein of 24 KDa may interfere with both the localization and the function of CCDC146. For this reason, we choose a small tag of only 1.1 KDa, to limit as such as possible the risk of interfering with the structure of the protein. This rational is now indicated in the manuscript lines 251-254. It is worth to note, that the tagged-strain shows no sperm defect, demonstrating that the HA-tag does not interfere with CCDC146 function.

      4) In the abstract, line 53, "provide evidence" is not the right term for something that is just suggestive. The term "suggests" would be more appropriate.

      The text was modified to take into account this remark

      5) Line 74: "genetic deficiency" sounds strange here, do the authors mean simply "mutation"?

      Infertility may be due to several genetic deficiency such as chromosomal defects (XXY (Klinefelter syndrome)), microdeletion of the Y chromosome or mutations in a single gene. Therefore, mutation is too restrictive. Nevertheless, we modified the sentence which is now “…or a genetic disorder including chromosomal or single gene deficiencies”

      6) Lines 163-164: the authors describe the mutations (premature stop mutations) and say that they could either lead to complete absence of the gene product, or the expression of a truncated protein. Did they test this, for example, with some immuno blot analyses?

      As stated above, unfortunately, we were unable to verify the presence of RNA-decay in these patients for lack of biological material.

      7) Line 184 and Fig 2E: the sperm head morphologies should be quantitatively assessed.

      We provide now a full statistical analysis of the observed defects: see new panel in Figure 2 F

      8) Fig 3: The annotation should be more precise - KO certainly means CDCC146-KO. The colours of the IH panels is different, which attracts attention but is clearly a colour-adjustment artefact. Colours should be adjusted for the panels to look comparable. It would be also helpful to add arrowheads into the figure to point at the phenotypes that are highlighted in the text.

      We have added Ccdc146 KO in all figures. We have added arrow heads to point out the spermatids showing a thin and elongated nucleus. Concerning adjustment of colors, we attempted to make images of panel B comparable. See new figure 3.

      9) Fig 6A: the authors use RT PCR to determine expression dynamics of their gene of interested, and use actin (apparently) as control. However, actin and CDCC146 expression levels follow the same trend. How is the interpreted?

      The reviewer did not understand the figure. The orange bars do not correspond to actin expression and the grey bars to Ccdc146 expression but both bars represent the mRNA expression levels of Ccdc146 relative to Actb (orange) and Hprt (grey) expression in CCDC146-HA mouse pups’ testes. We tested two housekeeping genes as reference to be sure that our results were not distorted by an unstable expression of a housekeeping gene. We did not see significant difference between both house keeping genes. Actin was not used.

      10) In line 235, the authors suggest posttranslational modifications of their protein as potential cause for a slightly different migration in SDS PAGE as predicted from the theoretical molecular weight. This is not necessarily the case, some proteins do migrate just differently as predicted.

      We have changed the text accordingly and now provide alternative explanation for the slightly different migration. See lines 258-259

      11) The annotation of Fig 6 panels is problematic. First, why do the authors write "Laemmli" as description of the gel? It would be more helpful to write what is loaded on the gel, such as "sperm". Second, in panels B and C it would be helpful to add the antibodies used. It is not clear why there is a signal in the WT lane of panel B, but not in the HA lane (supposing an anti-HA antibody is used: why has WT a specific HA band?). In panel C, it is not clear why the blot that has so beautifully shown a single band in panel B suddenly gives such a bad labelling. Can the authors explain this? Also, they cut off the blot, likely because to too much background, but this is bad practice as full blots should be shown. In the current state, the panel C does not allow any clear conclusion. To make it conclusive, it must be repeated.

      Several mistakes were present in this figure. This figure was recomposed. The WB on testicular extract was suppressed and we now present a new WB allowing to compare the presence of CCDC146 in the flagella and head fractions from WT and HA-CCDC146 sperm. Using an anti-HA Ab, we demonstrate that in epididymal sperm the protein is localized in the flagella only. See new figure 6. The corresponding text was changed accordingly.

      12) The authors have raised an HA-knockin mouse for CDCC146, which they explained by the unavailability of specific antibodies. However, in Fig 7, they use a CDCC146 antibody. Can they clarify?

      The commercial Ab work for HUMAN CCDC146 but not for MOUSE CCDC146. We have added few words to make the situation clearer, we have added the following information “the commercial Ab works for human CCDC146 only”. See line 240

      13) In Fig 7A (line 258), the authors hypothesise that they stain mitochondria - why not test this directly by co-staining with mitochondria markers?

      We chose another solution to resolve this question:

      To avoid the issue of the non-specificity of secondary antibodies, we performed a new set of IF experiments using an HA Tag Alexa Fluor® 488-conjugated Antibody (anti-HA-AF488-C Ab) on WT and HA-CCDC146 sperm. These results are now presented in figure 7 panel A (new). The specificity of the signal obtained with the anti-HA-AF488-C Ab on mouse spermatozoa was evaluated by performing a statistical study of the density of dots in the principal piece of the flagellum from HA-CCDC146 and WT sperm. These results are now presented in figure 7 panel B (new). This study was carried out by analyzing 58 WT spermatozoa and 65 CCDC146 spermatozoa coming from 3 WT and 3 KI males. We found a highly significant difference, with a p-value <0.0001, showing that the signal obtained on spermatozoa expressing the tagged protein is highly specific. We have added a paragraph in the MM section to describe the process of image analysis. We finally present new images obtained by ExM showing no staining in the midpiece (figure 7C new). Altogether, these results demonstrate unequivocally the presence of the protein in the whole flagellum.

      14) It seems that in both, Fig 7 and 8, the authors use expansion microscopy to localise CDCC146 in sperm tails. However, the staining differs substantially between the two figures. How is this explained?

      In figure 8 we used the commercial Ab in human sperm, whereas in figure 7 we used the anti-HA Abs in mouse sperm. Because the antibodies do not target the same part of the CCDC146 protein (the tag is placed at the N-terminus of the protein, and the HPA020082 Ab targets the last 130 amino acids of the Cter), their accessibility to the antigenic site could be different. However, it is important to note that both antibodies target the flagellum. This explanation is now inserted see lines 304-312

      15) Fig 8D and line 274: the authors do a fractionation, but only show the flagella fraction. Why?

      Showing all fractions of their experiment would have underpinned the specific enrichment of CDCC146 in the flagella fraction, which is what they aim to show. Actually, given the absence of control proteins, the fact that the band in the flagellar fraction appears to be weaker than in total sperm, one could even conclude that there is more CDCC146 in another (not analysed) fraction of this experiment. Thus, the experiment as it stands is incomplete and does not, as the authors claim, confirm the flagellar localisation of the protein.

      We agree with the reviewer’s remark. We provide now new results showing both flagella and nuclei fractions in new figure 6A. This experiment is presented lines 253-256

      16) Line 283, Fig 9D,F: The description of the microtubules in this experiment is not easy to understand. Do the authors mean to say that the labelling shows that the protein is associated with doublet microtubules, but not with the two central microtubules? They should try to find a clearer way to explain their result.

      As suggested by reviewer 2, we have changed the figure to make it clearer. The text was changed accordingly. See new figure 9 and new corresponding legend lines 1006.

      17) Fig 9G - how often could the authors observe this? Why is the axoneme frayed? Does this happen randomly, or did the authors apply a specific treatment?

      Yes, it happens randomly during the fixation process.

      18) Line 300 and Fig 10A - the authors talk about the 90-kDa band, but do say anything about what they think this band is representing.

      We have now added the following sentence lines 340-342: “This band may correspond to proteolytic fragment of CCDC146, the solubilization of microtubules by sarkosyl may have made CCDC146 more accessible to endogenous proteases.”

      19) Fig 11A, lines 321-322: the authors write that the connecting piece is severely damaged. This is not obvious for somebody who does not work in sperm. Perhaps the authors could add some arrow heads to point out the defects, and briefly describe them in the text.

      We realized from your remark that our message was not clear. In fact, there is a great variability in the morphological damages of the HTCA. For instance, the HTCA of Ccdc146 KO sperm presented in figure 10A2 is quite normal, whereas that in figure 10A4 is completely distorted. This point is now underlined in the corresponding text. See lines 367-369

      We also added the size of the marker bar (200 nm), which were missing in the figure’s legend.

      20) Line 323: it will be important to name which tubulin antibody has been used to identify centrioles, as they are heavily posttranslationally modified.

      The different types of anti-tubulin Abs are described in the corresponding figure’s legend

      21) Fig 11B - phenotypes must be quantified to make these observations meaningful.

      We agree that a quantification would improve the message. However, testicular sperm are obtained by enzymatic separation of spermatogenic cells and the number of testicular sperm are very low. Moreover, not all sperm are stained. Taking these two points into account, it seems to us that quantification could be difficult to analyze. For this reason, the quantification was not done; however, it is important to note that these defects were not observed in WT sperm, demonstrating that these defects are cased by the lack of CCDC146. We have added a sentence to underline this point; See lines 374-375

      22) Line 329: Figure 12AB - is this a typo - should it read Figure 12B?

      We have split the panel A in A1 and A2 and changed the text accordingly. See line 378

      23) Why are there not wildtype controls in Fig 12B, C?

      We provide now as Figure 12-figure supplement 1, a control image for fig 12B. For figure 12C, the emergence of the flagellum from the distal centriole in WT is already shown in Fig 12A1

      24) Fig 13: the authors write that the manchette is "clearly longer and wider than in WT cells" (lines 342-343). How can they claim this without quantitative data?

      We now provide a statistical analysis of the length of the manchette. See figure 13-figure supplement 1A. We also provide a new a new image illustrating the length of the manchette in Ccdc146 KO spermatids; See Figure 13-figure supplement 1B.

    1. Author Response

      We appreciate the insightful and constructive feedback from the reviewers regarding our manuscript, "Gain neuromodulation mediates perceptual switches: evidence from pupillometry, fMRI, and RNN Modelling." The comments have provided us with a number of valuable perspectives that will undoubtedly strengthen the impact and clarity of our work.

      We recognize the need for a more detailed and comparative analysis of the perceptual tasks used in our pupil and fMRI experiments. To address these points directly: the jittered intertrial intervals (ITIs) in the fMRI work were deemed necessary to effectively deconvolve the BOLD response (see Stottinger et al., 2018). In our fMRI work, each image was randomly preceded and followed by varying ITIs (2, 4, 6, and 8 seconds), ensuring an equitable distribution across sets and subjects. Importantly, our analysis of both fMRI and behavioral studies, including eye tracking data, indicates that perceptual switch behavior – the point at which switches occur – is consistent across modalities. If more predictive or preparatory activity were present in the fMRI version of the task, we would expect earlier switches or choices and altered reaction time distributions – neither of these signatures was observed in the original study (Stottinger et al., 2018). Importantly, this suggests that the additional time available in the fMRI experiments did not significantly alter behavioral outcomes. Thus, our findings suggest that despite the differences in timing and task structure, the behavioural responses remain consistent across both experimental setups. We will clarify this in the revised manuscript.

      In response to the reviewer's comments on our computational model, particularly regarding the modelling of noradrenaline (NA) effects in the RNN, we agree that modelling gain as stationary is a substantial approximation. However, given the slow ramping of pupil diameter, which served as our proxy for gain, it is an approximation that we believe is justified: in the revised manuscript, we will run additional simulations to ensure the validity of this approximation. In addition, whilst we agree that the model is more complicated than is needed for the task, we opted for RNN modelling, in lieu of a simpler modelling approach, because we wanted to use RNN modelling as a method for both hypothesis testing and generation. To build the RNN, the only key elements of model structure we had to specify in advance were the inputs and the target outputs of the network. The solution the RNN arrived at, although involving many more parameters than a simpler model, was entirely determined by optimisation (i.e., not our a priori hypotheses). We feel that this strengthens the result considerably. Importantly, this approach also allowed us to be surprised by the results of the model – for instance, we did not anticipate that the effect of gain on the energy landscape to be primarily mediated by inhibitory gain. In the revised manuscript, we will integrate this line of thinking into the paper. We are also sensitive to the fact that this result is both counterintuitive and difficult to study in high-dimensional dynamical systems like RNNs. In revisions, we will provide further analysis of the RNN and build a 2D approximation to the RNN that can be studied on the phase plane to better conceptually illuminate the mechanisms at play.

      Furthermore, we agree with the suggestion to consider alternative mechanisms that might contribute to perceptual switches, such as attention and top-down processing. While our study primarily focuses on LC-mediated gain modulation, we acknowledge the complexity of neural processes involved in perception and will expand our discussion to include these potential mechanisms. Furthermore, noting the importance of moderating the causal language used in our manuscript. We will revise our wording to more accurately reflect the correlational nature of our findings and ensure that our conclusions are firmly grounded in the data presented.

      In conclusion, we are enthusiastic about the opportunity to refine our manuscript based on these valuable comments. In an updated version, we will address the overall points by providing clearer explanations of our methods, refining our figures for better readability, and ensuring that our conclusions are supported by robust analysis. We believe that these revisions will not only address the concerns raised but also significantly enhance the overall quality of our research. We thank the reviewers for their thorough and thoughtful critiques and look forward to submitting our revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors explore the effects of DNA methylation on the strength of regulatory activity using massively parallel reporter assays in cell lines on a genome-wide level. This is a follow-up of their first paper from 2018 that describes this method for the first time. In addition to adding more indepth information on sequences that are explored by many researchers using two main methods, reduced bisulfite sequencing and sites represented on the Illumina EPIC array, they now show also that DNA methylation can influence changes in regulatory activity following a specific stimulation, even in absence of baseline effects of DNA methylation on activity. In this manuscript, the authors explore the effects of DNA methylation on the response to Interferon alpha (INFA) and a glucocorticoid receptor agonist (dexamethasone). The authors validate their baseline findings using additional datasets, including RNAseq data, and show convergences across two cell lines. The authors then map the methylation x environmental challenge (IFNA and dex) sequences identified in vitro to explore whether their methylation status is also predictive of regulatory activity in vivo. This is very convincingly shown for INFA response sequences, where baseline methylation is predictive of the transcriptional response to flu infection in human macrophages, an infection that triggers the INF pathways.

      Thank you for your strong assessment of our work!

      The extension of the functional validity of the dex-response altering sequences is less convincing.

      We agree. We note that genes close to dex-specific mSTARR-seq enhancers tend to be more strongly upregulated after dex stimulation than those near shared enhancers, which parallels our results for IFNA (lines 341-344). However, there is unfortunately no comparable data set to the human flu data set (i.e., with population-based whole genome-bisulfite sequencing data before and after dex challenge), so we could not perform a parallel in vivo validation step. We have added this caveat to the revised manuscript (lines 555-557).

      Sequences altering the response to glucocorticoids, however, were not enriched in DNA methylation sites associated with exposure to early adversity. The authors interpret that "they are not links on the causal pathway between early life disadvantage and later life health outcomes, but rather passive biomarkers". However, this approach does not seem an optimal model to explore this relationship in vivo. This is because exposure to early adversity and its consequences is not directly correlated with glucocorticoid release and changes in DNA methylation levels following early adversity could be related to many physiological mechanisms, and overall, large datasets and meta-analyses do not show robust associations of exposure to early adversity and DNA methylation changes. Here, other datasets, such as from Cushing patients may be of more interest.

      Thank you for making these important points. We have expanded the set of caveats regarding the lack of enrichment of early adversity-reported sites in the mSTARR-data set (lines 527-533). Specifically, we note that the relationship between early adversity and glucocorticoid physiology is complex (e.g., Eisenberger and Cole, 2012; Koss and Gunnar, 2018) and that dex challenge models one aspect of glucocorticoid signaling but not others (e.g., glucocorticoid resistance). Nevertheless, we also see little evidence for enrichment of early adversity-associated sites in the mSTARR data set at baseline, independently of the dex challenge experiment (lines 483-485; Figure 4).

      We also agree that large data sets (e.g., Houtepen et al., 2018; Marzi et al., 2018) and reviews (e.g., Cecil et al., 2020) of early adversity and DNA methylation in humans show limited evidence of associations between early adversity and DNA methylation levels. However, the idea that early adversity impacts downstream outcomes remains pervasive in the literature and popular science (see Dubois et al., 2019), which we believe makes tests like ours important to pursue. We also hope that our data set (and others generated through these methods) will be useful in interpreting other settings in which differential methylation is of interest as well—in line with your comment below. We have clarified both of these points in the revised manuscript (lines 520-522; 536-539).

      Overall, the authors provide a great resource of DNA methylation-sensitive enhancers that can now be used for functional interpretation of large-scale datasets (that are widely generated in the research community), given the focus on sites included in RBSS and the Illumina EPIC array. In addition, their data lends support that differences in DNA methylation can alter responses to environmental stimuli and thus of the possibility that environmental exposures that alter DNS methylation can also alter the subsequent response to this exposure, in line with the theory of epigenetic embedding of prior stimuli/experiences. The conclusions related to the early adversity data should be reconsidered in light of the comments above.

      Thank you! And yes, we have revised our discussion of early life adversity effects as discussed above.

      Reviewer #1 (Recommendations For The Authors):

      While the paper has a lot of strengths and provides new insight into the epigenomic regulation of enhancers as well as being a great resource, there are some aspects that would benefit from clarification.

      a. It would be great to have a clearer description of how many sequences are actually passing QC in the different datasets and what the respective overlaps are in bps or 600bp windows. Now often only % are given. Maybe a table/Venn diagram for overview of the experiments and assessed sequences would help here. This concern the different experiments in the K652, A549, and Hep2G cell lines, including stimulations.

      We now provide a supplementary figure and supplementary table providing, for each dataset, the number of 600 bp windows passing each filter (Figure 2-figure supplement 1; Supplementary File 9), as well as a supplementary figure providing an upset plot to show the number of assessed sequences shared across the experiments (Figure 2-figure supplement 2).

      b. It would also be helpful to have a brief description of the main differences in assessed sequences and their coverage of the old (2018) and new libraries in the main text to be able better interpret the validation experiments.

      We now provide information on the following characteristics for the 2018 data set versus the data set presented for the first time here: mean (± SD) number of CpGs per fragment; mean (± SD) DNA sequencing depth; and mean (± SD) RNA sequencing depth (lines 169-170 provide values for the new data set; in line 194, we reference Supplementary File 5, which provides the same values for the old data set). Notably, the coverage characteristics of analyzed windows in both data sets are quite high (mean DNA-seq read coverage = 94x and mean RNA-seq read coverage = 165x in the new data set at baseline; mean DNA-seq read coverage = 22x and mean RNA-seq read coverage = 54x in Lea et al. 2018).

      c. Statements of genome-wide analyses in the abstract and discussion should be a bit tempered, as quite a number of tested sites do not pass QC and do not enter the analysis. From the results it seems like from over 4.5 million sequences, only 200,000 are entering the analysis.

      The reason why many of the windows are not taken forward into our formal modeling analysis is that they fail our filter for RNA reads because they are never (or almost never) transcribed—not because there was no opportunity for transcription (i.e., the region was indeed assessed in our DNA library, and did not show output transcription, as now shown in Figure 2-figure supplement 1). We have added a rarefaction analysis (lines 715-722 in Materials and Methods) of the DNA fragment reads to the revised manuscript which supports this point. Specifically, it shows that we are saturated for representation of unique genomic windows (i.e., we are above the stage in the curve where the proportion of active windows would increase with more sequencing: Figure 1figure supplement 4). Similarly, a parallel rarefaction curve for the mSTARR-seq RNA-seq data (Figure 1-figure supplement 4) shows that we would gain minimal additional evidence for regulatory activity with more sequencing depth. We now reference these analyses in revised lines 179-184 and point to the supporting figure in line 182.

      In other words, our analysis is truly genome-wide, based on the input sequences we tested. Most of the genome just doesn’t have regulatory activity in this assay, despite the potential for it to be detected given that the relevant sequences were successfully transfected into the cells.

      d. Could the authors comment on the validity of the analysis if only one copy is present (cut-off for QC)?

      We think this question reflects a misunderstanding of our filtering criteria due to lack of clarity on our part, which we have modified in the revision. We now specify that the mean DNA-seq sequencing depth per sample for the windows we subjected to formal modeling was quite high:

      93.91 ± 10.09 SD (range = 74.5 – 113.5x) (see revised lines 169-170). In other words, we never analyze windows in which there is scant evidence that plasmids containing the relevant sequence were successfully transfected (lines 170-172).

      Our minimal RNA-seq criteria require non-zero counts in at least 3 replicate samples within either the methylated condition or the unmethylated condition, or both (lines 166-168). Because we know that multiple plasmids containing the corresponding sequence are present for all of these windows—even those that just cross the minimal RNA-seq filtering threshold—we believe our results provide valid evidence that all analyzed windows present the opportunity to detect enhancer activity, but many do not act as enhancers (i.e., do not result in transcribed RNA). Notably, we observe a negligible correlation between DNA sequencing depth for a fragment, among analyzed windows, and mSTARR-seq enhancer activity (R2 = 0.029; now reported in lines 183-184). We also now report reproducibility between replicates, in which all replicate pairs have r > 0.89, on par with previously published STARR-seq datasets (e.g., Klein et al., 2020; Figure 1-figure supplement 6, pointed to in line 193).

      e. While the authors state that almost all of the control sequences contain CpGs sites, could the authors also give information on the total number of CpG sites in the different subsets? Was the number of CpGs in a 600 bp window related to the effects of DNA methylation on enhancer activity?

      We now provide the number of CpG sites per window in the different subsets in lines 282-284. As expected, they are higher for EPIC array sites and for RRBS sites because the EPIC array is biased towards CpG-rich promoter regions, and the enzyme typically used in the starting step of RRBS digests DNA at CpG motifs (but control sequences still contain an average of ~13 CpG sites per fragment). We also now model the magnitude of the effects of DNA methylation on regulatory activity as a function of number of CpG sites within the 600 bp windows. Consistent with our previous work in Lea et al., 2018, we find that mSTARR-seq enhancers with more CpGs tend to be repressed by DNA methylation (now reported in lines 216-219 and Figure 1figure supplement 11).

      f. In the discussion, a statement on the underrepresented regions, likely regulatory elements with lower CG content, that nonetheless can be highly relevant for gene regulation would be important to put the data in perspective.

      Thanks for this suggestion. We agree that regulatory regions, independent of CpG methylation, can be highly relevant, and now clarify in the main text that the “unmethylated” condition of mSTARR-seq is essentially akin to a conventional STARR-seq experiment, in that it assesses regulatory activity regardless of CpG content or methylation status (lines 128-130).

      Consequently, our study is well-designed to detect enhancer-like activity, even in windows with low GC content. We now show with additional analyses that we generated adequate DNA-seq coverage on the transfected plasmids to analyze 90.2% of the human genome, including target regions with no or low CpG content (lines 148-149; 153-156; Supplementary file 2). As noted above, we also now clarify that regions dropped out of our formal analysis because we had little to no evidence that any transcription was occurring at those loci, not because sequences for those regions were not successfully transfected into cells (see responses above and new Figure 1-figure supplement 4 and Figure 2-figure supplement 1).

      g. To control for differences in methylation of the two libraries, the authors sequence a single CpGs in the vector. Could the authors look at DNA methylation of the 600 bp windows at the end of the experiment, could DNA methylation of these windows be differently affected according to sequence? 48 hours could be enough for de-methylation or re-methylation.

      We agree that variation in demethylation or remethylation depending on fragment sequence is possible. We now state this caveat in the main text (lines 158-159), and specify that genomic coverage of our bisulfite sequencing data across replicates are (unfortunately) too variable to perform reliable site-by-site analysis of DNA methylation levels before and after the 48 hour experiment (lines 1182-1185). Instead, we focus on a CpG site contained in the adapter sequence (and thus included in all plasmids) to generate a global estimate of per replicate methylation levels. We also now note that any de-methylation or re-methylation would reduce our power to detect methylation-dependent activity, rather than leading to false positives (lines 163-165).

      h. The section on the method for correction for multiple testing should be more detailed as it is very difficult to follow. Why were only 100 permutations used, the empirical p-value could then only be <0.01? The description of a subsample of the N windows with positive Betas is unclear, should the permutation not include the actual values and thus all windows - or were the no negative Betas? Was FDR accounting for all elements and pairs?

      We have now expanded the text in the Materials and Methods section to clarify the FDR calculation (lines 691, 695-699, 702, 706). We clarify that the 100 permutations were used to generate a null distribution of p-values for the data set (e.g., 100 x 17,461 p-values for the baseline data set), which we used to derive a false discovery rate. Because we base our evidence on FDRs, we therefore compare the distribution of observed p-values to the distribution of pvalues obtained via permutation; we do not calculate individual p-values by comparing an observed test statistic against the test statistics for permuted data for that individual window.

      We compare the data to permutations with only positive betas because in the observed data, we observe many negative betas. These correspond to windows which have no regulatory activity (i.e., they have many more input DNA reads than RNA-seq reads) and thus have very small pvalues in a model testing for DNA-RNA abundance differences. However, we are interested in controlling the false discovery rate of windows that do have regulatory activity (positive betas). In the permuted data, by contrast and because of the randomization we impose, test statistics are centered around 0 and essentially symmetrical (approximately equally likely to be positive or negative). Retaining all p-values to construct the null therefore leads to highly miscalibrated false discovery rates because the distribution of observed values is skewed towards smaller values— because of windows with “significantly” no regulatory activity—compared to the permuted data. We address that problem by using only positive betas from the permutations.

      i. The interpretation of the overlap of Dex-response windows with CpGs sites associated with early adversity should be revisited according to the points also mentioned in the public review and the authors may want to consider exploring additional datasets with other challenges.

      Thank you, see our responses to the public review above and our revisions in lines (lines 555559). We agree that comparisons with more data sets and generation of more mSTARR-seq data in other challenge conditions would be of interest. While beyond the scope of this manuscript, we hope the resource we have developed and our methods set the stage for just such analyses.

      Reviewer #2 (Public Review):

      This work presents a remarkably extensive set of experiments, assaying the interaction between methylation and expression across most CpG positions in the genome in two cell types. To this end, the authors use mSTARR-seq, a high-throughput method, which they have previously developed, where sequences are tested for their regulatory activity in two conditions (methylated and unmethylated) using a reporter gene. The authors use these data to study two aspects of DNA methylation:

      1) Its effect on expression, and 2. Its interaction with the environment. Overall, they identify a small number of 600 bp windows that show regulatory potential, and a relatively large fraction of these show an effect of methylation on expression. In addition, the authors find regions exhibiting methylation-dependent responses to two environmental stimuli (interferon alpha and glucocorticoid dexamethasone).

      The questions the authors address represent some of the most central in functional genomics, and the method utilized is currently the best method to do so. The scope of this study is very impressive and I am certain that these data will become an important resource for the community. The authors are also able to report several important findings, including that pre-existing DNA methylation patterns can influence the response to subsequent environmental exposures.

      Thank you for this generous summary!

      The main weaknesses of the study are: 1. The large number of regions tested seems to have come at the expense of the depth of coverage per region (1 DNA read per region per replicate). I have not been convinced that the study has sufficient statistical power to detect regulatory activity, and differential regulatory activity to the extent needed. This is likely reflected in the extremely low number of regions showing significant activity.

      We apologize for our lack of clarity in the previous version of the manuscript. Nonzero coverage for half the plasmid-derived DNA-seq replicates is a minimum criterion, but for the baseline dataset, the mean depth of DNA coverage per replicate for windows passing the DNA filter is quite high: 12.723 ± 41.696 s.d. overall, and 93.907 ± 10.091 s.d. in the windows we subjected to full analysis (i.e., windows that also passed the RNA read filter). We now provide these summary statistics in lines 148-149 and 169-170 and Supplementary file 5 (see also our responses to Reviewer 1 above). We also now show, using a rarefaction analysis, that our data set saturates the ability to detect regulatory windows based on DNA and RNA sequencing depth (new Figure 1-figure supplement 4; lines 179-184; 715-722).

      2) Due to the position of the tested sequence at the 3' end of the construct, the mSTARR-seq approach cannot detect the effect of methylation on promoter activity, which is perhaps the most central role of methylation in gene regulation, and where the link between methylation and expression is the strongest. This limitation is evident in Fig. 1C and Figure 1-figure supplement 5C, where even active promoters have activity lower than 1. Considering these two points, I suspect that most effects of methylation on expression have been missed.

      Thank you for pointing this out. We agree that we have not exhaustively detected methylationdependent activity in all promoter regions, given that not all promoter regions are active in STARR-seq. However, there is good evidence that some promoter regions can function like enhancers and thus be detected in STARR-seq-type assays (Klein et al., 2020). This important point is now noted in lines 187-189; an example promoter showing methylation-dependent regulatory activity in our dataset is shown in Figure 3E.

      We also now clarify that Figure 1C shows significant enrichment of regulatory activity in windows that overlap promoter sequence (line 239). The y-axis is not a measure of activity, but rather the log-transformed odds ratio, with positive values corresponding to overrepresentation of promoter sequences in regions of mSTARR-seq regulatory activity. Active promoters are 1.640 times more likely to be detected with regulatory activity than expected by chance (p = 1.560 x 10-18), which we now report in a table that presents enrichment statistics for all ENCODE elements shown in Figure 1C for clarity (Supplementary file 4). Moreover, 74.1% of active promoters that show regulatory activity have methylation-dependent activity, also now reported in Supplementary file 4.

      Overall, the combination of an extensive resource addressing key questions in functional genomics, together with the findings regarding the relationship between methylation and environmental stimuli makes this a key study in the field of DNA methylation.

      Thank you again for the positive assessment!

      Reviewer #2 (Recommendations For The Authors):

      I suggest the authors conduct several tests to estimate and/or increase the power of the study:

      1) To estimate the potential contribution of additional sequencing depth, I suggest the authors conduct a downsampling analysis. If the results are not saturated (e.g., the number of active windows is not saturated or the number of differentially active windows is not saturated), then additional sequencing is called for.

      We appreciate the suggestion. We have now performed a downsampling/rarefaction curve analysis in which we downsampled the number of DNA reads, and separately, the number of RNA reads. We show that for both DNA-seq depth and RNA-seq depth, we are within the range of sequencing depth in which additional sequencing would add minimal new analysis windows in the dataset (Figure 1-figure supplement 4; lines 179-184; 715-722).

      2) Correlation between replicates should be reported and displayed in a figure because low correlations might also point to too few reads. The authors mention: "This difference likely stems from lower variance between replicates in the present study, which increases power", but I couldn't find the data.

      We now report the correlations between RNA and DNA replicates within the current dataset and within the Lea et al., 2018 dataset (Figure 1-figure supplement 6). The between-replicate correlations in both our RNA libraries and DNA libraries are consistently high (r ≥ 0.89).

      3) The correlation between the previous and current K562 datasets is surprisingly low. Given that these datasets were generated in the same cell type, in the same lab, and using the same protocol, I expected a higher correlation, as seen in other massively parallel reporter assays. The fact that the correlations are almost identical for a comparison of the same cell and a comparison of very different cell types is also suspicious.

      Thanks for raising this point. We think it is in reference to our original Figure 1-Figure supplement 6, for which we now provide Pearson correlations in addition to R2 values (now Figure 1-Figure supplement 8). We note that this is not a correlation in raw data, but rather the correlation in estimated effect sizes from a statistical model for methylation-dependent activity. We now provide Pearson correlations for the raw data between replicates within each dataset (Figure 1-Figure supplement 6), which for the baseline dataset are all r > 0.89 for RNA replicates and r > 0.98 for DNA replicates, showing that replicate reproducibility in this study is on par with other published studies (e.g., Klein et al., 2020 report r > 0.89 for RNA replicates and r > 0.91 for DNA replicates).

      We do not know of any comparable reports in other MPRAs for effect size correlations between two separately constructed libraries, so it’s unclear to us what the expectation should be. However, we note that all effect sizes are estimated with uncertainty, so it would be surprising to us to observe a very high correlation for effect sizes in two experiments, with two independently constructed libraries (i.e., with different DNA fragments), run several years apart—especially given the importance of winner’s curse effects and other phenomena that affect point estimates of effect sizes. Nevertheless, we find that regions we identify as regulatory elements in this study are 74-fold more likely to have been identified as regulatory elements in Lea et al., 2018 (p < 1 x10-300).

      4) The authors cite Johnson et al. 2018 to support their finding that merely 0.073% of the human genome shows activity (1.7% of 4.3%), but:

      a. the percent cited is incorrect: this study found that 27,498 out of 560 million regions (0.005%) were active, and not 0.165% as the authors report.

      We have modified the text to clarify the numerator and denominator used for the 0.165% estimate from Johnson et al 2018 (lines 175-176). The numerator is their union set of all basepairs showing regulatory activity in unstimulated cells, which is 5,547,090 basepairs. The denominator is the total length of the hg38 human genome, which is 3,298,912,062 basepairs.

      Notably, the denominator (the total human genome) is not 560 million—while Johnson et al (2018) tested 560 million unique ~400 basepair fragments, these fragments were overlapping, such that the 560 million fragments covered the human genome 59 times (i.e., 59x coverage).

      b. other studies that used massively parallel reporter assays report substantially higher percentages, suggesting that the current study is possibly underpowered. Indeed, the previous mSTARR-seq found a substantially larger percentage of regions showing regulatory activity (8%). The current study should be compared against other studies (preferably those that did not filter for putatively active sequences, or at least to the random genomic sequences used in these studies).

      We appreciate this point and have double checked comparisons to Johnson et al., 2018 and Lea et al., 2018. Our numbers are not unusual relative to Johnson et al., 2018 (0.165%), which surveyed the whole genome. Also, in comparing to the data from Lea et al., 2018, when processed in an identical manner (our criteria are more stringent here), our values of the percent of the tested genome showing significant regulatory activity are also similar: 0.108% in the Lea et al., 2018 dataset versus 0.082% in the baseline dataset. Finally, our rarefaction analyses (see our responses above) indicate that we are not underpowered based on sequencing depth for RNA or DNA samples. We also note that there are several differences in our analysis pipeline from other studies: we use more technical replicates than is typical (compare to 2-5 replicates in Arnold et al., 2013; Johnson et al., 2018; Muerdter et al., 2018), we measure DNA library composition based on DNA extracted from each replicate post-transfection (as opposed to basing it on the pre-transfection library: [Johnson et al., 2018], and we use linear mixed models to identify regulatory activity as opposed to binomial tests [Johnson et al., 2018; Arnold et al., 2013; Muerdter et al., 2018].

      I find it confusing that the four sets of CpG positions used: EPIC, RRBS, NR3C1, and random control loci, add up together to 27.3M CpG positions. Do the 600 bp windows around each of these positions sufficient to result in whole-genome coverage? If so, a clear explanation of how this is achieved should be added.

      Thanks for this comment. Although our sequencing data are enriched for reads that cover these targeted sites, the original capture to create the input library included some off target reads (as is typical of most capture experiments, which are rarely 100% efficient). We then sequenced at such high depth that we ultimately obtained sequencing coverage that encompassed nearly the whole genome. We now clarify in the main text that our protocol assesses 27.3 million CpG sites by assessing 600 bp windows encompassing 93.5% of all genomic CpG sites (line 89), which includes off-target sites (line 149).

      scatter plot showing the RNA to DNA ratios of the methylated (x-axis) vs unmethylated (y-axis) library would be informative. I expect to see a shift up from the x=y diagonal in the unmethylated values.

      We have added a supplementary figure showing this information, which shows the expected shift upwards (Figure 1-figure supplement 9).

      Another important figure missing is a histogram showing the ratios between the unmethylated and methylated libraries for all active windows, with the significantly differentially active windows marked.

      We have added a supplementary figure showing this information (Figure 1-Supplementary Figure 10).

      Perhaps I missed it, but what is the distribution of effect sizes (differential activity) following the various stimuli?

      This information is provided in table form in Supplementary Files 3, 10, and 11, which we now reference in the Figure 2 legend (lines 365-366).

      Minor changes

      It is unclear what the lines connecting the two groups in Fig.3C represent, as these are two separate groups of regions.

      We now clarify in the figure legend that values connected by a line are the same regions, not two different sets of regions. They show the correlation between DNA methylation and gene expression at mSTARR-seq-identified enhancers in individuals before and after IAV stimulation, separately for enhancers that are shared between conditions (left) versus those that are IFNAspecific (right). The two plots therefore do show two different sets of regions, which we have depicted to visualize the contrast in the effect of stimulation on the correlation on IFNA-specific enhancers versus shared enhancers. We have revised the figure legend to clarify these points (line 458-460).

      L235-242 are unclear. Specifically - isn't the same filter mentioned in L241-242 applied to all regions?

      Yes, the same filter for minimal RNA transcription was applied to all regions. We have modified the text (lines 264-265, 271, 275-277) to clarify that the enrichment analyses were performed twice, to test whether the target types were: 1) enriched in the dataset passing the RNA filter (i.e., the dataset showing plasmid-derived RNA reads in at least half the sham or methylated replicates; n = 216,091 windows) and 2) enriched in the set of windows showing significant regulatory activity (at FDR < 1%; n = 3,721 windows).

      To improve cohesiveness, the section about most CpG sites associated with early life adversity not showing regulatory activity in K562s can be moved to the supplementary in my opinion.

      Thank you for this suggestion. Because ELA and the biological embedding hypothesis (via DNA methylation) were major motivations for our analysis (see Introduction lines 42-48; 75-79), and we also discuss these results in the Discussion (lines 518-520), we have respectfully elected to retain this section in the main manuscript. We have added text in the Discussion explaining why we think experimental tests of methylation effects on regulation are relevant to the literature on early life adversity (lines 520-522), and have added discussion on limits to these analyses (lines 527-533).

      References:

      Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A (2013) Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science, 339, 1074-1077.

      Cecil CA, Zhang Y, Nolte T (2020) Childhood maltreatment and DNA methylation: A systematic review. Neuroscience & Biobehavioral Reviews, 112, 392-409.

      Dubois M, Louvel S, Le Goff A, Guaspare C, Allard P (2019) Epigenetics in the public sphere: interdisciplinary perspectives. Environmental Epigenetics, 5, dvz019.

      Eisenberger NI, Cole SW (2012) Social neuroscience and health: neurophysiological mechanisms linking social ties with physical health. Nature neuroscience, 15, 669-674.

      Houtepen L, Hardy R, Maddock J, Kuh D, Anderson E, Relton C, Suderman M, Howe L (2018) Childhood adversity and DNA methylation in two population-based cohorts. Translational Psychiatry, 8, 1-12.

      Johnson GD, Barrera A, McDowell IC, D’Ippolito AM, Majoros WH, Vockley CM, Wang X, Allen AS, Reddy TE (2018) Human genome-wide measurement of drug-responsive regulatory activity. Nature communications, 9, 1-9.

      Klein JC, Agarwal V, Inoue F, Keith A, Martin B, Kircher M, Ahituv N, Shendure J (2020) A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nature Methods, 17, 1083-1091.

      Koss KJ, Gunnar MR (2018) Annual research review: Early adversity, the hypothalamic–pituitary– adrenocortical axis, and child psychopathology. Journal of Child Psychology and Psychiatry, 59, 327-346.

      Marzi SJ, Sugden K, Arseneault L, Belsky DW, Burrage J, Corcoran DL, Danese A, Fisher HL, Hannon E, Moffitt TE (2018) Analysis of DNA methylation in young people: limited evidence for an association between victimization stress and epigenetic variation in blood. American journal of psychiatry, 175, 517-529.

      Muerdter F, Boryń ŁM, Woodfin AR, Neumayr C, Rath M, Zabidi MA, Pagani M, Haberle V, Kazmar T, Catarino RR (2018) Resolving systematic errors in widely used enhancer activity assays in human cells. Nature methods, 15, 141-149.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1) Can the authors statistically define the egg-laying classes? In some parts of the manuscript, the division between the different classes could be more ambiguous. I understand that the class III strains are divided by the kcnl-1 genotype, but given the different results for diverse traits, it could be more clear to keep them as one class. Also, overall, the authors choose a collection of 15 strains across the different classes to phenotype for many traits and perform genome edits. It is understandable that they cannot test all strains, but given the variation across traits and classes, it might be good to add a few more caveats about how these strains might not be representative of all strains across the species.

      Response: The egg-laying classes were defined as in Figure 1A by arbitrarily chosen cut-offs (at 10, 10-25, and 25 eggs in utero) to simplify subsequent analyses. We added this explanation to the first paragraph of the results section. However, the differences in average egg retention are significantly different between the four defined classes using the 15 selected strains (Fig. 2A).

      We think that the distinction between Class IIIA and IIIB strains is important and justified because the two Classes significantly differ in mean egg retention (Fig. 2A) and because Class IIIB harbour the large-effect variant KCNL-1 V530L whereas Class IIIA do not.

      We agree that the 15 selected strains are not necessarily representative of all strains across the species. We have added a note of caution regarding this point to the first paragraph of the section “Temporal progression of egg retention and internal hatching”: “Note that this strain selection, especially concerning the largest Class II, is unlikely to reflect the overall strain diversity observed across the species". In addition, we have reworded the first sentence of this paragraph as follows: “ To better characterize natural variation in C. elegans egg retention, we focused on a subset of 15 strains from divergent phenotypic Classes I-III, with an emphasis on Class III strains exhibiting strong egg retention (at mid-L4 + 30h) (Fig. 2A and 2B).”

      2) For the GWAS experiments, the authors should describe if any of the QTL overlap with hyper-divergent regions in the strain set. The QTL could be driven by these less well defined regions.

      Response: We have added the following sentence: “The three QTLs do not align with any of the recently identified hyper-divergent regions of the genome (Lee et al., 2021).

      3) The authors should look at correlations between the mod-5(n822) edit phenotypes and the exogenous 5-HT and SSRI phenotypes to demonstrate how the traits can differ. Some correlation plots might help that point as well.

      Response: We examined all possible correlations as suggested: none are significant and strain effects on trait differences are idiosyncratic, as written in our results section. The correlational analyses remain of limited value due to small samples: N=10 for mean strain values for measured phenotypes. We therefore feel that these analyses do not provide any additional insights beyond our figures (4C, 4D, 5C, 5D, S5A-C ) and our statement on page 15: “As in previous experiments (Fig. 4C and 5C), we find again that strains sharing the same egg retention phenotype may differ strongly in egg-laying behaviour in response to modulation of both exo- and endogenous serotonin levels (Class IIIA: ED3005 and JU2829) (Fig. 5D and S5C).”

      4) Figure 6D, was there any censoring of the data? Normally, these types of studies are plagued by an increase in censored animals that can decrease significance. The effects among the classes seem large, but statistical comparisons might help as well.

      Response: There was no censoring of animals (censoring of animals in lifespan studies is usually done by removing “bags of worms”, which here was our study phenotype). We now mention this in the corresponding figure legend. We also added a statistical analysis showing that mean survival was significantly different between all Classes.

      5) Many of the traits, edits, and deeper analyses are performed on the JU751 genetic background. This choice is sensible, otherwise, the work can increase exponentially. However, the authors should add a caveat about how these results might be limited to JU751 and other strains might respond differently.

      Response: For certain experiments, it was not feasible to include multiple strains from all phenotypic classes, so we selected JU751 (Class IIIB) and JU1200 (Class II), for which we had established CRISPR-engineered lines to modulate the egg retention phenotype by a single amino acid change in KCNL-1. To emphasize that these experimental observations cannot be generalized, we added the following statement in the relevant results section: “These experimental results offer preliminary evidence (bearing in mind that our analysis was primarily centered on a single genetic background) that laying of advanced-stage embryos may enhance intraspecific competitive ability, particularly in scenarios where multiple genotypes compete for colonization and exploitation of limited, patchily distributed resources.”

      6) The authors argue that evolution could be acting on specific parts of the egg-laying machinery (e.g., muscledirected signaling components). It might be useful to look at levels of standing variation and selection at groups of loci compared to genomic controls to see if this conclusion can be strengthened.

      Response: This is a good idea but how to select pertinent candidate loci is unclear (there are over 300 genes with effects on egg laying, www.wormbase.org). In addition, the genetics of muscle-directed signalling components in egg laying is only starting to be explored, with no specific candidate genes having been identified (Medrano & Collins, 2023, Curr Biol). We therefore think that such an analysis is currently not possible.

      7) Completely optional: The authors present a compelling and interesting case for transitions and trade-offs between oviparity and viviparity. The C. vivipara species has a different egg-laying mode than other Caenorhabditis species. The authors could add a short section describing their expectations about the neuronal morphology, 5-HT circuits, and muscle function in this species given their results. What genes or circuits should be the focus of future studies to address this question in Caenorhabditis. Also, Loer and Rivard present some similar ideas based on the differences in 5-HT staining neurons across diverse nematodes. Those results can be incorporated and discussed as well.

      Response: Our current research focuses on the evolution of egg laying in different Caenorhabditis species. So far, however, it remains difficult to provide specific hypotheses on how the egg-laying circuit has changed in C. vivipara. We rephrased the final paragraph of the discussion to incorporate some of the reviewer’s suggestions: “Nematodes display frequent transitions from oviparity to obligate viviparity in many distinct genera (Sudhaus, 1976; Ostrovsky et al., 2015), including in the genus Caenorhabditis, with at least one viviparous species, C. vivipara (Stevens et al., 2019). Although evidence exists for the evolution of egg-laying circuitry across oviparous Caenorhabditis species (Loer and Rivard, 2007), the specific cellular and genetic changes responsible for the transition to obligate viviparity in C. vivipara have yet to be examined. Resolving the genetic basis of intraspecific variation in C. elegans egg retention, including partial or facultative viviparity, may thus shed light on the molecular changes underlying the initial steps of evolutionary transitions from oviparity to obligate viviparity in invertebrates.”

      Specific edits:

      1) Perhaps a silly point, but "parity" (to my knowledge) does not have a biological meaning on its own. I suggest "egg-laying mode" or "birth mode".

      Response: This term has been used previously in the literature (e.g.https://onlinelibrary.wiley.com/doi/10.1111/jeb.13886 or https://doi.org/10.1101/2023.10.22.563505). However, as the referee rightly points out, this is not a standard term. We therefore replaced “parity mode” with “egg-laying mode”.

      2) "Against fluctuating environmental fluctuations" is a bit strange

      Response: Corrected.

      3) The first publications of Egl mutants were by the Horvitz lab so some citations are not in all of the first descriptions of the trait (early in Results)

      Response: We have added the relevant work (Trent 1982, Trent 1983, Desai & Horvitz 1989) to this paragraph in the early results section.

      4) "Strong egg retention usually strongly..." is a bit strange

      Response: Corrected.

      1. Figure 8G font looks smaller than the others.

      Response: Corrected.

      Reviewer #2:

      1) In Figure 1A, I infer that in the graph class I measurements are represented by dark blue dots and class II by purple dots. I am having a really hard time distinguishing between these two colors in the graph. In the pie chart I have no problem, but in the graph the black lines around the colored dots seem to obscure the colors. Not sure how to fix this graphical problem, but it is preventing the graph from communicating the results effectively.

      Response: We have changed the colours, spacing and format of this figure to resolve this problem.

      2) The behavioral analysis of Figure 3B-3F is problematic. The experimental methods used and the interpretation of the results each have issues. This is cause for concern since this is the most direct analysis of the actual variations in egg-laying behavior across strains presented in this paper.

      This experiment is modeled after the work of Waggoner et al. 1998, who recorded egg laying events of individual worms on video over several hours and noted the exact time of individual egg laying events. Waggoner et al. found in the reference C. elegans strain N2 that egg-laying events occurred in ~2 minute clusters ("active phases") separated by ~20 minute silent periods ("inactive phases"). Mignerot et al. did not take continuous videos of animals, but rather examined plates bearing a single worm only every 5 minutes and noted the number of new eggs that appeared on the plate in each 5-minute interval. From these data, the authors claim they have measured the intervals between "egg-laying phases" (the term used in the Figure 3 legend). In the Results, the authors explicitly claim they are measuring the timing and frequency of actual active and inactive egg-laying phases. Apparently, all the eggs laid within one 5-minute interval are considered to have been laid in a single active phase, and the time between 5-minute intervals containing egg laying events is considered an "inactive phase" and is measured only with a resolution of 5 minutes. It is not explained anywhere how the authors handle the situation of seeing eggs laid in two consecutive 5-minute intervals. Is that one active phase that is 10 minutes long, or is that two separate active phases with a 5-minute active phase in between? Because of this ambiguity in how they define active and inactive phases, I find it impossible to understand and judge the data presented in Fig. 3D-3F. The authors in the results state that "Class I and Class IIIB displayed significantly accelerated and reduced egg laying activity respectively (Fig. 3C to 3E)" . I assume they are referring to the statistical analysis described in the figure legend, which is quite difficult to understand. Frankly, just looking at the graphs in Fig. 3D3F, it is hard for the reader to identify specific features shown in the graphs can explain why, for example, Class I strains have fewer retained eggs than Class III strains. So, I found this analysis very unsatisfying.

      I also feel the authors are making an unwarranted assumption that their non-N2 strains will have distinguishable active and inactive phases of egg-laying behavior analogous to those seen in the N2 strain. Given the possibly large variations in egg-laying behavior in the various strains examined, that assumption should be questioned. Thus, framing the entire analysis of behavior patterns in terms of the length of active and inactive phases might not be appropriate.

      Response: This comment validly highlights important problems and limitations of our scan-sampling method to quantify strain differences in egg-laying behaviour. We acknowledge that we failed to present the data with due diligence, and clarity regarding terminology and interpretation. However, we think that some of these results are still of value after revised presentation. Our biggest mistake was to use the terms “active and inactive phase”, as coined by Waggoner et al. 1998. We are aware that our measures are not equivalent to these previously defined measures but have been sloppy with terminology. We therefore carefully reworded this entire results section, using clear definitions to indicate differences between the Waggoner assay and our assay (including a graphical representation of our assay design in the revised Fig. 3B). In brief, our simplified assay is useful to estimate the frequency and approximate duration of prolonged inactive periods of egg laying because we can unambiguously determine intervals in which eggs were laid or not. In contrast, as pointed out by the reviewer, we cannot determine if multiple active phases occurred within a 5-min interval, nor can we estimate the duration of an active “phase”. We now state this limitation explicitly in the manuscript. What our results do show is that the number of intervals during which egg laying occurred is significantly different between strains and Classes: Class I (low retention) have a higher number of intervals with egg-laying events, whereas Class IIIB showed a reduced number of such events (Fig. 3D). We can therefore also roughly estimate the mean time (per individual) between two egg-laying intervals, giving us a proxy for prolonged periods when egg-laying is inactive (Fig. 3E); we note that our estimate for N2 is very close to what has been previously measured (~20 min). Therefore, we can confidently conclude that there are natural strains which have both shorter (Class I) and longer (Class IIIB) inactive periods of egg laying. These results partly align with observed variation in egg retention. However, we agree with the reviewer – as we had stated both in results and discussion sections – that these behavioural differences act together with differences in the sensing of egg accumulation in utero (as suggested by results shown in Fig. 3G and 3H). We also agree that it seems very plausible that the observed behavioural differences, as revealed by scan-sampling, may only have a secondary role in accounting for natural variation in egg retention. We will be testing these hypotheses specifically in our future research.

      Note: The statistical analyses are nested ANOVAs to ask (a) does the value differ between strains within a given class and (b) does the value differ between Classes? Classes labelled with different letters in the figures therefore significantly differ in their mean values, demonstrating that measured behavioural phenotypes consistently differ between some (but not all) phenotypic classes, yet largely in line with their egg retention phenotypes (Fig. 3D and 3E).

      3) Figure 4A is a schematic diagram of how the egg-laying circuit works based on previous literature, and the authors cite Collins et al. 2015 and Kopchock et al. 2021 as their sources. One feature of this figure seems unwarranted, namely the part indicating that egg accumulation acts on the UM muscles, and the statement in the legend that "mechanical excitation of uterine muscles (UM) in response to egg accumulation favours exit from the inactive state (Collins et al., 2016)". I believe Collins et al. 2016 showed that egg accumulation favors egg laying and may have speculated that it does so by stretching the um muscles, but this idea remains speculative and has not been established by any experimental data. I point out this issue,in particular, because it may bear on the nice data the authors of this manuscript show in Figure 3G and 3H, which show that some strains accumulate many eggs in the uterus before they initiate egg laying.

      Also, in Figure 4A and 4B, the legend does not explain the logic of the green areas labeled "egg-laying active phase" and the yellow area labeled "egg-laying inactive state". I was not sure what sure how to interpret these features of the graphics.

      Response: The input from uterine muscles remains indeed hypothetical, and we have corrected the figure accordingly, now simply referring to the feedback of egg accumulation on egg laying activity, as recently characterized in more detail by Medrano & Collins (2023, Curr Biol).

      The green/yellow backgrounds shown in figures 4A (and 4B) are not useful and we have removed them.

      4) Results, page 11: "We used standard assays, in which animals are reared in liquid M9 buffer without bacterial food." In the standard assays, animals are reared on NGM agar plates with bacterial food, and then at the start of the egg-laying assay, are transferred to liquid M9 buffer without bacterial food. I assume that is what these authors did, and they should correct the language of the text to make it more accurate.

      Response: The reviewer is correct. We have incorporated this change to improve accuracy.

      5) The authors note that "serotonin induced a much stronger egg-laying responds in the Class IIIA strain ED3005 than in other strains (Fig. 4C)". I would like to point out to the authors that strains such as ED3005 that have a very large number of unlaid eggs in their uterus are prone to lay a very large number of eggs when treated with exogenous serotonin, simply for the trivial reason that they have more eggs to release. This was previously seen in, for example, in Desai and Horvitz (1989) in certain egg-laying defective mutants.

      Response: This is an important point and our comparison of ED3005 to ALL other strains is problematic. We changed this result description by stating that ED3005 shows possible serotonin hypersensitivity compared to strains with similar levels of egg retention (Class IIIA): “In addition, serotonin induced a much stronger egg-laying response in the strain ED3005 than in other Class IIIA strains with similar levels of egg retention (Fig. 4B). ED3005 may thus exhibit serotonin hypersensitivity, which has been observed in certain egg-laying mutants where perturbed synaptic transmission impacts serotonin signalling (Schafer and Kenyon, 1995; Schafer et al., 1996).”

      6) In Figure 4 the authors show that all strains lay eggs in response to fluoxetine and imipramine, but some strains (Class IIIB) do not lay eggs in response to serotonin. They then cite a series of papers, starting with Trent et al. 1983, that they claim show that this specific phenotype demonstrates that the HSN neurons are functionally releasing serotonin (bottom of page 11). This statement needs to be removed - it is incorrect. It is true that egg laying in response to fluoxetine and/or imipramine AS WELL AS egg laying in response to serotonin has been interpreted as indicating the presence of HSN neurons that functionally release serotonin to stimulate egg laying (these were referred to as Category C by Trent et al., 1983). However, the mutants that Mignerot et al. are talking about (those that don't respond to serotonin but do respond to imipramine/fluoxetine) were called Category D by Trent et al., 1983, and to my knowledge these have never been interpreted as necessarily having functionally intact HSN neurons. Mutants such as these that can lay eggs in some circumstances but cannot lay eggs in response to exogenous serotonin have usually been interpreted as having egg-laying muscles that are defective in responding to serotonin.

      How can we interpret strains that respond to imipramine/fluoxetine and not serotonin? Mignerot et al. cite some of the papers (Kullyev et al. 2010; Wenishenker et al., 1999; Yue et al., 2018) showing that imipramine and fluoxetene have off-target effects and can stimulate egg laying by acting through proteins other than the serotonin-reuptake inhibitor. The authors later in their discussion at the top of Page 24 also cite Dempsey et al 2005, a paper that also argues that imipramine and fluoxetene act via off target effects. However, currently in Figure 4B Mignerot et al. emphasize that the serotonin reuptake inhibitor is the target of these drugs. Since the results presented for Class IIIB strains are not in accord with this interpretation, this seems misleading to me. The bottom line for me is that class IIIB strains cannot respond to exogenous serotonin, but can lay eggs in other conditions, so perhaps there is something specifically wrong with their ability to respond to serotonin.

      Response: We thank the reviewer for this important comment – we misinterpreted some of these past findings and our statements were either inexact or incorrect. We have revised this section accordingly: “Both drugs also stimulated egg laying in the Class IIIB strains and the Class IIIA strain JU2829 for which exogenous serotonin either inhibited egg laying or had no effect on it (Fig. 4B). In the past, mutants unresponsive to serotonin yet responsive to other drugs, including fluoxetine and imipramine, have been interpreted as being defective in the serotonin response of vulval muscles (Trent et al., 1983; Reiner et al., 1995; Weinshenker et al., 1995). This is indeed the likely case of Class IIIB strains carrying the KCNL-1 V530L variant thought to specifically reduce excitability of vulval muscles (Vigne et al., 2021). Our results therefore suggest that JU2829 (Class IIIA) may exhibit a similar defect in vulval muscle activation via serotonin caused by an alternative genetic change. Overall, these pharmacological assays do not allow us to conclude if and how HSN function has diverged among strains because the mode of action and targets of tested drugs has not been fully resolved. Nevertheless, our results are consistent with previous models proposing that these drugs do not simply block serotonin reuptake but can stimulate egg laying, to some extent, through mechanisms independent of serotonergic signaling (Trent et al., 1983; Desai and Horvitz, 1989; Reiner et al., 1995; Weinshenker et al., 1995, 1999; Dempsey et al., 2005; Kullyev et al., 2010; Branicky et al., 2014; Yue et al., 2018).”

      We removed the oversimplified Fig. 4B to avoid any misinterpretation.

      8) In Figure 7B and 7C, the authors should add some type of error bars to the graphs to and give the readers an idea of whether the differences between strains that they write about are statistically significant or not.

      Response: These are frequency data to describe temporal dynamics of hatching (N=45-72 eggs per strain) (Fig. 7B) and development in single cohorts (N=48-177 eggs per strain) (Fig. 7C), hence, the absence of error bars.

      We agree that this representation of the data is not very telling. We therefore changed the data representation in these two figures to show that there are clear, statistically significant, negative correlations between egg retention and time to hatching / egg-to-adult developmental time.

      9) When the authors reference a list of papers in a single list, e.g. "(Burton et al., 2021; Fausett et al., 2021; Garsin et al., 2001; Padilla et al., 2002; Van Voorhies and Ward, 2000)" they seem to do so in alphabetical order by the first author's last name. I believe the usual practice is to list references by year of publication, with the earliest first.

      Response: We corrected citation style according to eLIFE format.

      10) At the top of page 24, the authors write "It seems unlikely, however, that any of these variants strongly alter central function of HSN and HSN-mediated signalling because fluoxetine and imipramine, known to act via HSN (Dempsey et al., 2005; Trent et al., 1983; Weinshenker et al., 1995), triggered a robust stimulatory effect on egg laying in all examined strains (Fig. 4C)." I believe that the Weinshenker paper in fact showed that imipramine does not act via the HSN, and the Dempsey paper suggested that both drugs can act at least in part independently of the HSN. Therefore, the authors should revise their statement.

      Response: We have removed the sentence.

      Reviewing Editor:

      Minor suggestions:

      1) p. 2, fifth line from bottom: "lead" instead of "leads";

      2) p. 2, last line: "muscle" instead of "muscles";

      3) p. 3, first full paragraph, 17th line: "populations" instead of "population";

      4) p. 5, fourth line from bottom: Delete first comma;

      5) p. 6, Figure 1D: "of" instead of "off";

      6) p. 7, fifth line: "KCNL-1";

      7) p. 9, third paragraph, second line: please clarify "late mid-L4";

      8) p. 16, first line: "exogenous";

      9) p 20, first paragraph, beginning of second sentence: "Whether" instead of "If";

      10) p. 22, ninth line from bottom: delete "shaped by";

      11) p. 23, last paragraph, third and eighth lines from bottom: change "between" to "among"

      Response: Thank you. All corrected.

      Additional changes:

      Figure 5A: We removed figure 5A showing a cartoon of mod-5/SERT and its effects on serotonin signalling. This figure was incorrectly showing that MOD-5 is expressed in HSN (Jafari et al 2011 J. Neuroscience, Hammarlund et al 2018 Neuron).

      Abstract: We reworded the abstract to reduce its length.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Koumoundourou et al., identify a pathway downstream of Bcl11b that controls synapse morphology and plasticity of hippocampal mossy fiber synapses. Using an elegant combination of in vivo, ex vivo, and in vitro approaches, the authors build on their previous work that indicated C1ql2 as a functional target of Bcl11b (De Bruyckere et al., 2018). Here, they examine the functional implications of C1ql2 at MF synapses in Bcl11b cKO mice and following C1ql2 shRNA. The authors find that Bcl11b KO and shRNA against C1ql2 significantly reduces the recruitment of synaptic vesicles and impairs LTP at MF synapses. Importantly, the authors test a role for the previously identified C1ql2 binding partner, exon 25b-containing Nrxn3 (Matsuda et al., 2016), as relevant at MF synapses to maintain synaptic vesicle recruitment. To test this, the authors developed a K262E C1ql2 mutant that disrupts binding to Nrxn3. Curiously, while Bcl11b KO and C1ql2 KD largely phenocopy (reduced vesicle recruitment and impaired LTP), only vesicle recruitment is dependent on C1ql2-Nrxn3 interactions. These findings provide new insight into the functional role of C1ql2 at MF synapses. While the authors convincingly demonstrate a role for C1ql2-Nrxn3(25b+) interaction for vesicle recruitment and a Nrxn3(25b+)independent role for C1ql2 in LTP, the underlying mechanisms remain inconclusive. Additionally, a discussion of how these findings relate to previous work on C1ql2 at mossy fiber synapses and how the findings contribute to the biology of Nrxn3 would increase the interpretability of this work.

      As suggested by reviewer #1, we extended our discussion of previous work on C1ql2 and additionally discussed the biology of Nrxn3 and how our work relates to it. Moreover, we extended our mechanistic analysis of how Bcl11b/C1ql2/Nrxn3 pathway controls synaptic vesicle recruitment as well as LTP (please see also response to reviewer #2 points 5 and 8 and reviewer #3 point 4 of public reviews below for detailed discussion).

      Reviewer #2 (Public Review):

      This manuscript describes experiments that further investigate the actions of the transcription factor Bcl11b in regulating mossy fiber (MF) synapses in the hippocampus. Prior work from the same group had demonstrated that loss of Bcl11b results in loss of MF synapses as well as a decrease in LTP. Here the authors focus on a target of Bcl11b a secreted synaptic organizer C1ql2 which is almost completely lost in Bcl11b KO. Viral reintroduction of C1ql2 rescues the synaptic phenotypes, whereas direct KD of C1ql2 recapitulates the Bcl1 phenotype. C1ql2 itself interacts directly with Nrxn3 and replacement with a binding deficient mutant C1q was not able to rescue the Bcl11b KO phenotype. Overall there are some interesting observations in the study, however there are also some concerns about the measures and interpretation of data.

      The authors state that they used a differential transcriptomic analysis to screen for candidate targets of Bcl11b, yet they do not present any details of this screen. This should be included and at the very least a table of all DE genes included. It is likely that many other genes are also regulated by Bcl11b so it would be important to the reader to see the rationale for focusing attention on C1ql2 in this study.

      The transcriptome analysis mentioned in our manuscript was published in detail in our previous study (De Bruyckere et al., 2018), including chromatin-immunoprecipitation that revealed C1ql2 as a direct transcriptional target of Bcl11b. Upon revision of the manuscript, we made sure that this was clearly stated within the main text module to avoid future confusion. In the same publication (De Bruyckere et al., 2018), we discuss in detail several identified candidate genes such as Sema5b, Ptgs2, Pdyn and Penk as putative effectors of Bcl11b in the structural and functional integrity of MFS. C1ql2 has been previously demonstrated to be almost exclusively expressed in DG neurons and localized to the MFS.

      There it bridges the pre- and post-synaptic sides through interaction with Nrxn3 and KAR subunits, respectively, and regulates synaptic function (Matsuda et al., 2016). Taken together, C1ql2 was a very good candidate to study as a potential effector downstream of Bcl11b in the maintenance of MFS structure and function. However, as our data reveal, not all Bcl11b mutant phenotypes were rescued by C1ql2 (see supplementary figures 2d-f of revised manuscript). We expect additional candidate genes, identified in our transcriptomic screen, to act downstream of Bcl11b in the control of MFS.

      All viral-mediated expression uses AAVs which are known to ablate neurogenesis in the DG (Johnston DOI: 10.7554/eLife.59291) through the ITR regions and leads to hyperexcitability of the dentate. While it is not clear how this would impact the measurements the authors make in MF-CA3 synapses, this should be acknowledged as a potential caveat in this study.

      We agree with reviewer #2 and are aware that it has been demonstrated that AAV-mediated gene expression ablates neurogenesis in the DG. To avoid potential interference of the AAVs with the interpretability of our phenotypes, we made sure during the design of the study that all of our control groups were treated in the same way as our groups of interest, and were, thus, injected with control AAVs. Moreover, the observed phenotypes were first described in Bcl11b mutants that were not injected with AVVs (De Bruyckere et al., 2018). Finally, we thoroughly examined the individual components of the proposed mechanism (rescue of C1ql2 expression, over-expression of C1ql3 and introduction of mutant C1ql2 in Bcl11b cKOs, KD of C1ql2 in WT mice, and Nrxn123 cKO) and reached similar conclusions. Together, this strongly supports that the observed phenotypes occur as a result of the physiological function of the proteins involved in the described mechanism and not due to interference of the AAVs with these biological processes. We have now addressed this point in the main text module of the revised ms.

      The authors claim that the viral re-introduction "restored C1ql2 protein expression to control levels. This is misleading given that the mean of the data is 2.5x the control (Figure 1d and also see Figure 6c). The low n and large variance are a problem for these data. Moreover, they are marked ns but the authors should report p values for these. At the least, this likely large overexpression and variability should be acknowledged. In addition, the use of clipped bands on Western blots should be avoided. Please show the complete protein gel in primary figures of supplemental information.

      We agree with reviewer #2 that C1ql2 expression after its re-introduction in Bcl11b cKO mice was higher compared to controls and that this should be taken into consideration for proper interpretation of the data. To address this, based also on the suggestion of reviewer #3 point 1 below, we overexpressed C1ql2 in DG neurons of control animals. We found no changes in synaptic vesicle organization upon C1ql2 over-expression compared to controls. This further supports that the observed effect upon rescue of C1ql2 expression in Bcl11b cKOs is due to the physiological function of C1ql2 and not as result of the overexpression. These data are included in supplementary figure 2g-j and are described in detail in the results part of the revised manuscript.

      Additionally, we looked at the effects of C1ql2 overexpression in Bcl11b cKO DGN on basal synaptic transmission. We plotted fEPSP slopes versus fiber volley amplitudes, measured in slices from rescue animals, as we had previously done for the control and Bcl11b cKO (Author response image 1a). Although regression analysis revealed a trend towards steeper slopes in the rescue mice (Author response image 1a and b), the observation did not prove to be statistically significant, indicating that C1ql2 overexpression in Bcl11b cKO animals does not strongly alter basal synaptic transmission at MFS. Overall, our previous and new findings support that the observed effects of the C1ql2 rescue are not caused by the artificially elevated levels of C1ql2, as compared to controls, but are rather a result of the physiological function of C1ql2.

      Following the suggestion of reviewer #2 all western blot clipped bands were exchanged for images of the full blot. This includes figures 1c, 4c, 6b and supplementary figure 2g of the revised manuscript. P-value for Figure 1d has now been included.

      Author response image 1.

      C1ql2 reintroduction in Bcl11b cKO DGN does not significantly alter basal synaptic transmission at mossy fiber-CA3 synapses. a Input-output curves generated by plotting fEPSP slope against fiber volley amplitude at increasing stimulation intensities. b Quantification of regression line slopes for input-output curves for all three conditions. Control+EGFP, 35 slices from 16 mice; Bcl11b cKO+EGFP, 32 slices from 14 mice; Bcl11b cKO+EGFP-2A-C1ql2, 22 slices from 11 mice. The data are presented as means, error bars represent SEM. Kruskal-Wallis test (non-parametric ANOVA) followed by Dunn’s post hoc pairwise comparisons. p=0.106; ns, not significant.

      Measurement of EM micrographs: As prior work suggested that MF synapse structure is disrupted the authors should report active zone length as this may itself affect "synapse score" defined by the number of vesicles docked. More concerning is that the example KO micrographs seem to have lost all the densely clustered synaptic vesicles that are away from the AZ in normal MF synapses e.g. compare control and KO terminals in Fig 2a or 6f or 7f. These terminals look aberrant and suggest that the important measure is not what is docked but what is present in the terminal cytoplasm that normally makes up the reserve pool. This needs to be addressed with further analysis and modifications to the manuscript.

      As requested by reviewer #2 we analyzed and reported in the revised manuscript the active zone length. We found that the active zone length remained unchanged in all conditions (control/Bcl11b cKO/C1ql2 rescue, WT/C1ql2 KD, control/K262E and control/Nrxn123 cKO), strengthening our results that the described Bcl11b/C1ql2/Nrxn3 mechanism is involved in the recruitment of synaptic vesicles. These data have been included in supplementary figures 2c, 4h, 5f and 6g and are described in the results part of the revised manuscript.

      We want to clarify that the synapse score is not defined by the number of docked vesicles to the plasma membrane. The synapse score, which is described in great detail in our materials and methods part and has been previously published (De Bruyckere et al., 2018), rates MFS based on the number of synaptic vesicles and their distance from the active zone and was designed according to previously described properties of the vesicle pools at the MFS. The EM micrographs refer to the general misdistribution of SV in the proximity of MFS. Upon revision of the manuscript, we made sure that this was clearly stated in the main text module to avoid further confusion.

      The study also presents correlated changes in MF LTP in Bcl11b KO which are rescued by C1ql2 expression. It is not clear whether the structural and functional deficits are causally linked and this should be made clearer in the manuscript. It is also not apparent why this functional measure was chosen as it is unlikely that C1ql2 plays a direct role in presynaptic plasticity mechanisms that are through a cAMP/ PKA pathway and likely disrupted LTP is due to dysfunctional synapses rather than a specific LTP effect.

      The inclusion of functional experiments in this and our previous study (de Bruyckere et al., 2018) was first and foremost intended to determine whether the structural alterations observed at MFB disrupt MFS signaling. From the signaling properties we tested, basal synaptic transmission (this study) and short-term potentiation (de Bruyckere et al., 2018) were unaltered by Bcl11b KO, whereas MF LTP was found to be abolished (de Bruyckere et al., 2018). Indeed, because MF LTP largely depends on presynaptic mechanisms, including the redistribution of the readily releasable pool and recruitment of new active zones (Orlando et al., 2021; Vandael et al., 2020), it appears to be particularly sensitive to the specific structural changes we observed. We therefore believe that it is valuable information that MF LTP is affected in Bcl11b cKO animals - it conveys a direct proof for the functional importance of the observed morphological alterations, while basic transmission remains largely normal. Furthermore, it subsequently provided a functional marker for testing whether the reintroduction of C1ql2 in Bcl11b cKO animals or the KD of C1ql2 in WT animals can functionally recapitulate the control or the Bcl11b KO phenotype, respectively.

      We fully agree with the reviewer that C1ql2 is unlikely to directly participate in the cAMP/PKA pathway and that the ablation of C1ql2 likely disrupts MF LTP through an alternative mode of action. Our original wording in the paragraph describing the results of the forskolin-induced LTP experiment might have overstressed the importance of the cAMP pathway. We have now rephrased that paragraph to better describe the main idea behind the forskolin experiment, namely to circumvent the initial Ca2+ influx in order to test whether deficient presynaptic Ca2+ channel/KAR signaling might be responsible for the loss of LTP in Bcl11b cKO. The results are strongly indicative of a downstream mechanism and further investigation is needed to determine the specific mechanisms by which C1ql2 regulates MFLTP, especially in light of the result that C1ql2.K262E rescued LTP, while it was unable to rescue the SV recruitment at the MF presynapse. This raises the possibility that C1ql2 can influence MF-LTP through additional, yet uncharacterized mechanisms, independent of SV recruitment. As such, a causal link between the structural and functional deficits remains tentative and we have now emphasized that point by adding a respective sentence to the discussion of our revised manuscript. Nevertheless, we again want to stress that the main rationale behind the LTP experiments was to assess the functional significance of structural changes at MFS and not to elucidate the mechanisms by which MF LTP is established.

      The authors should consider measures that might support the role of Bcl11b targets in SV recruitment during the depletion of synapses or measurements of the readily releasable pool size that would complement their findings in structural studies.

      We fully agree that functional measurements of the readily releasable pool (RRP) size would be a valuable addition to the reported redistribution of SV in structural studies. We have, in fact, attempted to use high-frequency stimulus trains in both field and single-cell recordings (details on single-cell experiments are described in the response to point 8) to evaluate potential differences in RRP size between the control and Bcl11b KO (Figure for reviewers 2a and b). Under both recording conditions we see a trend towards lower values of the intersection between a regression line of late responses and the y-axis. This could be taken as an indication of slightly smaller RRP size in Bcl11b mutant animals compared to controls. However, due to several technical reasons we are extremely cautious about drawing such far-reaching conclusions based on these data. At most, they suffice to conclude that the availability of release-ready vesicles in the KO is likely not dramatically smaller than in the control.

      The primary issue with using high-frequency stimulus trains for RRP measurements at MFS is the particularly low initial release probability (Pr) at these synapses. This means that a large number of stimulations is required to deplete the RRP. As the RRP is constantly replenished, it remains unclear when steady state responses are reached (reviewed by Kaeser and Regehr, 2017). This is clearly visible in our single-cell recordings (Author response image 2b), which were additionally complicated by prominent asynchronous release at later stages of the stimulus train and by a large variability in the shapes of cumulative amplitude curves between cells. In contrast, while the cumulative amplitude curves for field potential recordings do reach a steady state (Author response image 2a), field potential recordings in this context are not a reliable substitute for single cell or, in the case of MFB, singlebouton recordings. Postsynaptic cells in field potential recordings are not clamped, meaning that the massive release of glutamate due to continuous stimulation depolarizes the postsynaptic cells and reduces the driving force for Na+, irrespective of depletion of the RRP. This is supported by the fact that we consistently observed a recovery of fEPSP amplitudes later in the trains where RRP had presumably been maximally depleted. In summary, high-frequency stimulus trains at the field potential level are not a valid and established technique for estimating RRP size at MFS.

      Specialized laboratories have used highly advanced techniques, such as paired recordings between individual MFB and postsynaptic CA3 pyramidal cells, to estimate the RRP size of MFB (Vandael et al., 2020). These approaches are outside the scope of our present study which, while elucidating functional changes following Bcl11b depletion and C1ql2 rescue, does not aim to provide a high-end biophysical analysis of the presynaptic mechanisms involved.

      Author response image 2.

      Estimation of RRP size using high-frequency stimulus trains at mossy fiber-CA3 synapses. a Results from field potential recordings. Cumulative fEPSP amplitude in response to a train of 40 stimuli at 100 Hz. All subsequent peak amplitudes were normalized to the amplitude of the first peak. Data points corresponding to putative steady state responses were fit with linear regression (RRP size is indirectly reflected by the intersection of the regression line with the yaxis). Control+EGFP, 6 slices from 5 mice; Bcl11b cKO+EGFP, 6 slices from 3 mice. b Results from single-cell recordings. Cumulative EPSC amplitude in response to a train of 15 stimuli at 50 Hz. The last four stimuli were fit with linear regression. Control, 5 cells from 4 mice; Bcl11b cKO, 3 cells from 3 mice. Note the shallow onset of response amplitudes and the subsequent frequency potentiation. Due to the resulting increase in slope at higher stimulus numbers, intersection with the y-axis occurs at negative values. The differences shown were not found to be statistically significant; unpaired t-test or Mann-Whitney U-test.

      Bcl11b KO reduces the number of synapses, yet the I-O curve reported in Supp Fig 2 is not changed. How is that possible? This should be explained.

      We agree with reviewer #2– this apparent discrepancy has indeed struck us as a counterintuitive result. It might be that synapses that are preferentially eliminated in Bcl11b cKO are predominantly silent or have weak coupling strength, such that their loss has only a minimal effect on basal synaptic transmission. Although perplexing, the result is fully supported by our single-cell data which shows no significant differences in MF EPSC amplitudes recorded from CA3 pyramidal cells between controls and Bcl11b mutants (Author response image 3; please see the response below for details and also our response to Reviewer #1 question 2).

      Matsuda et al DOI: 10.1016/j.neuron.2016.04.001 previously reported that C1ql2 organizes MF synapses by aligning postsynaptic kainate receptors with presynaptic elements. As this may have consequences for the functional properties of MF synapses including their plasticity, the authors should report whether they see deficient postsynaptic glutamate receptor signaling in the Bcl11b KO and rescue in the C1ql2 re-expression.

      We agree that the study by Matsuda et al. is of key importance for our present work. Although MF LTP is governed by presynaptic mechanisms and we previously did not see differences in short-term plasticity between the control and Bcl11b cKO (De Bruyckere et al., 2018), the clustering of postsynaptic kainate receptors by C1ql2 is indeed an important detail that could potentially alter synaptic signaling at MFS in Bcl11b KO. We, therefore, re-analyzed previously recorded single-cell data by performing a kinetic analysis on MF EPSCs recorded from CA3 pyramidal cells in control and Bcl11b cKO mice (Figure for reviewers 3a) to evaluate postsynaptic AMPA and kainate receptor responses in both conditions. We took advantage of the fact that AMPA receptors deactivate roughly 10 times faster than kainate receptors, allowing the contributions of the two receptors to mossy fiber EPSCs to be separated (Castillo et al., 1997 and reviewed by Lerma, 2003). We fit the decay phase of the second (larger) EPSC evoked by paired-pulse stimulation with a double exponential function, yielding a fast and a slow component, which roughly correspond to the fractional currents evoked by AMPA and kainate receptors, respectively. Analysis of both fast and slow time constants and the corresponding fractional amplitudes revealed no significant differences between controls and Bcl11b mutants (Figure for reviewers 3e-h), indicating that both AMPA and kainate receptor signaling is unaffected by the ablation of C1ql2 following Bcl11b KO.

      Importantly, MF EPSC amplitudes evoked by the first and the second pulse (Author response image 3b), paired-pulse facilitation (Author response image 3c) and failure rates (Author response image 3d) were all comparable between controls and Bcl11b mutants. These results further corroborate our observations from field recordings that basal synaptic transmission at MFS is unaltered by Bcl11b KO.

      We note that the results from single cell recordings regarding basal synaptic transmission merely confirm the observations from field potential recordings, and that the attempted measurement of RRP size at the single cell level was not successful. Thus, our single-cell data do not add new information about the mechanisms underlying the effects of Bcl11b-deficiency and we therefore decided not to report these data in the manuscript.

      Author response image 3.

      Basal synaptic transmission at mossy fiber-CA3 synapses is unaltered in Bcl11b cKO mice. a Representative average trace (20 sweeps) recorded from CA3 pyramidal cells in control and Bcl11b cKO mice at minimal stimulation conditions, showing EPSCs in response to paired-pulse stimulation (PPS) at an interstimulus interval of 40 ms. The signal is almost entirely blocked by the application of 2 μM DCG-IV (red). b Quantification of MF EPSC amplitudes in response to PPS for both the first and the second pulse. c Ratio between the amplitude of the second over the first EPSC. d Percentage of stimulation events resulting in no detectable EPSCs for the first pulse. Events <5 pA were considered as noise. e Fast decay time constant obtained by fitting the average second EPSC with the following double exponential function: I(t)=Afaste−t/τfast+Aslowe−t/τslow+C, where I is the recorded current amplitude after time t, Afast and Aslow represent fractional current amplitudes decaying with the fast (τfast) and slow (τslow) time constant, respectively, and C is the offset. Starting from the peak of the EPSC, the first 200 ms of the decaying trace were used for fitting. f Fractional current amplitude decaying with the fast time constant. g-h Slow decay time constant and fractional current amplitude decaying with the slow time constant. For all figures: Control, 8 cells from 4 mice; Bcl11b cKO, 8 cells from 6 mice. All data are presented as means, error bars indicate SEM. None of the differences shown were found to be statistically significant; Mann-Whitney U-test for nonnormally and unpaired t-test for normally distributed data.

      Reviewer #3 (Public Review):

      Overall, this is a strong manuscript that uses multiple current techniques to provide specific mechanistic insight into prior discoveries of the contributions of the Bcl11b transcription factor to mossy fiber synapses of dentate gyrus granule cells. The authors employ an adult deletion of Bcl11b via Tamoxifen-inducible Cre and use immunohistochemical, electron microscopy, and electrophysiological studies of synaptic plasticity, together with viral rescue of C1ql2, a direct transcriptional target of Bcl11b or Nrxn3, to construct a molecular cascade downstream of Bcl11b for DG mossy fiber synapse development. They find that C1ql2 re-expression in Bcl11b cKOs can rescue the synaptic vesicle docking phenotype and the impairments in MF-LTP of these mutants. They also show that C1ql2 knockdown in DG neurons can phenocopy the vesicle docking and plasticity phenotypes of the Bcl11b cKO. They also use artificial synapse formation assays to suggest that C1ql2 functions together with a specific Nrxn3 splice isoform in mediating MF axon development, extending these data with a C1ql2-K262E mutant that purports to specifically disrupt interactions with Nrxn3. All of the molecules involved in this cascade are disease-associated and this study provides an excellent blueprint for uncovering downstream mediators of transcription factor disruption. Together this makes this work of great interest to the field. Strengths are the sophisticated use of viral replacement and multi-level phenotypic analysis while weaknesses include the linkage of C1ql2 with a specific Nrxn3 splice variant in mediating these effects.

      Here is an appraisal of the main claims and conclusions:

      1) C1ql2 is a downstream target of Bcl11b which mediates the synaptic vesicle recruitment and synaptic plasticity phenotypes seen in these cKOs. This is supported by the clear rescue phenotypes of synapse anatomy (Fig.2) and MF synaptic plasticity (Fig.3). One weakness here is the absence of a control assessing over-expression phenotypes of C1ql2. It's clear from Fig.1D that viral rescue is often greater than WT expression (totally expected). In the case where you are trying to suppress a LoF phenotype, it is important to make sure that enhanced expression of C1ql2 in a WT background does not cause your rescue phenotype. A strong overexpression phenotype in WT would weaken the claim that C1ql2 is the main mediator of the Bcl11b phenotype for MF synapse phenotypes.

      As suggested by reviewer #3, we carried out C1ql2 over-expression experiments in control animals. We show that the over-expression of C1ql2 in the DG of control animals had no effect on the synaptic vesicle organization in the proximity of MFS. This further supports that the observed effect upon rescue of C1ql2 expression in Bcl11b cKOs is due to the physiological function of C1ql2 and not a result of the artificial overexpression. These data are now included in supplementary figure 2g-j and are described in detail in the results part of the revised manuscript. Please also see response to point 3 of reviewer #2.

      2) Knockdown of C1ql2 via 4 shRNAs is sufficient to produce the synaptic vesicle recruitment and MFLTP phenotypes. This is supported by clear effects in the shRNA-C1ql2 groups as compared to nonsense-EGFP controls. One concern (particularly given the use of 4 distinct shRNAs) is the potential for off-target effects, which is best controlled for by a rescue experiment with RNA insensitive C1ql2 cDNA as opposed to nonsense sequences, which may not elicit the same off-target effects.

      We agree with reviewer #3 that the usage of shRNAs could potentially create unexpected off-target effects and that the introduction of a shRNA-insensitive C1ql2 in parallel to the expression on the shRNA cassette would be a very effective control experiment. However, the suggested experiment would require an additional 6 months (2 months for AAV production, 2-3 months from animal injection to sacrifice and 1-2 months for EM imaging/analysis and LTP measurements) and a high number of additional animals (minimum 8 for EM and 8 for LTP measurements). We note here, that before the production of the shRNA-C1ql2 and the shRNA-NS, the individual sequences were systematically checked for off-target bindings on the murine exome with up to two mismatches and presented with no other target except the proposed (C1ql2 for shRNA-C1ql2 and no target for shRNA-NS). Taking into consideration our in-silico analysis, we feel that the interpretation of our findings is valid without this (very reasonable) additional control experiment.

      3) C1ql2 interacts with Nrxn3(25b+) to facilitate MF terminal SV clustering. This claim is theoretically supported by the HEK cell artificial synapse formation assay (Fig.5), the inability of the K262-C1ql2 mutation to rescue the Bcl11b phenotype (Fig.6), and the altered localization of C1ql2 in the Nrxn1-3 deletion mice (Fig.7). Each of these lines of experimental evidence has caveats that should be acknowledged and addressed. Given the hypothesis that C1ql2 and Nrxn3b(25b) are expressed in DG neurons and work together, the heterologous co-culture experiment seems strange. Up till now, the authors are looking at pre-synaptic function of C1ql2 since they are re-expressing it in DGNs. The phenotypes they are seeing are also pre-synaptic and/or consistent with pre-synaptic dysfunction. In Fig.5, they are testing whether C1ql2 can induce pre-synaptic differentiation in trans, i.e. theoretically being released from the 293 cells "post-synaptically". But the post-synaptic ligands (Nlgn1 and and GluKs) are not present in the 293 cells, so a heterologous synapse assay doesn't really make sense here. The effect that the authors are seeing likely reflects the fact that C1ql2 and Nrxn3 do bind to each other, so C1ql2 is acting as an artificial post-synaptic ligand, in that it can cluster Nrxn3 which in turn clusters synaptic vesicles. But this does not test the model that the authors propose (i.e. C1ql2 and Nrxn3 are both expressed in MF terminals). Perhaps a heterologous assay where GluK2 is put into HEK cells and the C1ql2 and Nrxn3 are simultaneously or individually manipulated in DG neurons?

      C1ql2 is expressed by DG neurons and is then secreted in the MFS synaptic cleft, while Nrxn3, that is also expressed by DG neurons, is anchored at the presynaptic side. In our work we used the well established co-culture system assay and cultured HEK293 cells secreting C1ql2 (an IgK secretion sequence was inserted at the N-terminus of C1ql2) together with hippocampal neurons expressing Nrxn3(25b+). We used the HEK293 cells as a delivery system of secreted C1ql2 to the neurons to create regions of high concentration of C1ql2. By interfering with the C1ql2-Nrxn3 interaction in this system either by expression of the non-binding mutant C1ql2 variant in the HEK cells or by manipulating Nrxn expression in the neurons, we could show that C1ql2 binding to Nrxn3(25b+) is necessary for the accumulation of vGlut1. However, we did not examine and do not claim within our manuscript that the interaction between C1ql2 and Nrxn3(25b+) induces presynaptic differentiation. Our experiment only aimed to analyze the ability of C1ql2 to cluster SV through interaction with Nrxn3. Moreover, by not expressing potential postsynaptic interaction partners of C1ql2 in our system, we could show that C1ql2 controls SV recruitment through a purely presynaptic mechanism. Co-culturing GluK2-expressing HEK cells with simultaneous manipulation of C1ql2 and/or Nrxn3 in neurons would not allow us to appropriately answer our scientific question, but rather focus on the potential synaptogenic function of the Nrxn3/C1ql2/GluK2 complex and the role of the postsynaptic ligand in it. Thus, we feel that the proposed experiment, while very interesting in characterization of additional putative functions of C1ql2, may not provide additional information for the point we were addressing. In the revised manuscript we tried to make the aim and methodological approach of this set of experiments more clear.

      4) K262-C1ql2 mutation blocks the normal rescue through a Nrxn3(25b) mechanism (Fig.6). The strength of this experiment rests upon the specificity of this mutation for disrupting Nrxn3b binding (presynaptic) as opposed to any of the known postsynaptic C1ql2 ligands such as GluK2. While this is not relevant for interpreting the heterologous assay (Fig.5), it is relevant for the in vivo phenotypes in Fig.6. Similar approaches as employed in this paper can test whether binding to other known postsynaptic targets is altered by this point mutation.

      It has been previously shown that C1ql2 together with C1ql3 recruit postsynaptic GluK2 at the MFS. However, loss of just C1ql2 did not affect the recruitment of GluK2, which was disrupted only upon loss of both C1ql2 and C1ql3 (Matsuda et al., 2018). In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 can recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (KARs and BAI3; Fig.5; please also see response above). Furthermore, we have now performed a kinetic analysis on single-cell data which we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b KO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling is altered upon the loss of C1ql2 following Bcl11b cKO (Author response image 3e-h; please also see our response to reviewer #2 point 8). Thus, we have no experimental evidence supporting the idea that a loss of interaction between C1ql2.K262E and GluK2 would interfere with the examined phenotype. However, to exclude that the K262E mutation disrupts interaction between C1ql2 and GluK2, we performed co-immunoprecipitation from protein lysate of HEK293 cells expressing GluK2myc-flag and GFP-C1ql2 or GluK2-myc-flag and GFP-K262E and could show that both C1ql2 and K262E had GluK2 bound when precipitated. These data are included in supplementary figure 5k of the revised manuscript.

      5) Altered localization of C1ql2 in Nrxn1-3 cKOs. These data are presented to suggest that Nrx3(25b) is important for localizing C1ql2 to the SL of CA3. Weaknesses of this data include both the lack of Nrxn specificity in the triple a/b KOs as well as the profound effects of Nrxn LoF on the total levels of C1ql2 protein. Some measure that isn't biased by this large difference in C1ql2 levels should be attempted (something like in Fig.1F).

      We acknowledge that the lack of specificity in the Nrxn123 model makes it difficult to interpret our data. We have now examined the mRNA levels of Nrxn1 and Nrxn2 upon stereotaxic injection of Cre in the DG of Nrxn123flox/flox animals and found that Nrxn1 was only mildly reduced. At the same time Nrxn2 showed a tendency for reduction that was not significant (data included in supplementary figure 6a of revised manuscript). Only Nrxn3 expression was strongly suppressed. Of course, this does not exclude that the mild reduction of Nrxn1 and Nrxn2 interferes with the C1ql2 localization at the MFS. We further examined the mRNA levels of C1ql2 in control and Nrxn123 mutants to ensure that the observed changes in C1ql2 protein levels at the MFS are not due to reduced mRNA expression and found no changes (data are included in supplementary figure 6b of the revised manuscript), suggesting that overall protein C1ql2 expression is normal.

      The reduced C1ql2 fluorescence intensity at the MFS was first observed when non-binding C1ql2 variant K262E was introduced to Bcl11b cKO mice that lack endogenous C1ql2 (Fig.6). In these experiments, we found that despite the overall high protein levels of C1ql2.K262E in the hippocampus (Fig. 6c), its fluorescence intensity at the SL was significantly reduced compared to WT C1ql2 (Fig. 6d-e). The remaining signal of the C1ql2.K262E at the SL was equally distributed and in a punctate form, similar to WT C1ql2. Together, this suggests that loss of C1ql2-Nrxn3 interaction interferes with the localization of C1ql2 at the MFS, but not with the expression of C1ql2. Of course, this does not exclude that other mechanisms are involved in the synaptic localization of C1ql2, beyond the interaction with Nrxn3, as both the mutant C1ql2 in Bcl11b cKO and the endogenous C1ql2 in Nrxn123 cKOs show residual immunofluorescence at the SL. Further studies are required to determine how C1ql2-Nrxn3 interaction regulates C1ql2 localization at the MFS.

      Reviewer #1 (Recommendations For The Authors):

      In addition to addressing the comments below, this study would benefit significantly from providing insight and discussion into the relevant potential postsynaptic signaling components controlled exclusively by C1ql2 (postsynaptic kainate receptors and the BAI family of proteins).

      We have now performed a kinetic analysis on single-cell data that we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b cKO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling differ between controls and upon the loss of C1ql2 following Bcl11b cKO (Author response image 3e-h; please also see our response to Reviewer #2 point 8). This agrees with previous findings that C1ql2 regulates postsynaptic GluK2 recruitment together with C1ql3 and only loss of both C1ql2 and C1ql3 results in a disruption of KAR signaling (Matsuda et al., 2018). In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 can recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (KARs and BAI3; Fig.5; please also see our response to reviewer #3 point 4 above). We believe that further studies are needed to fully understand both the pre- and the postsynaptic functions of C1ql2. Because the focus of this manuscript was on the role of the C1ql2-Nrxn3 interaction and our investigation on postsynaptic functions of C1ql2 was incomplete, we did not include our findings on postsynaptic current kinetics in our revised manuscript. However, we increased the discussion on the known postsynaptic partners of C1ql2 in the revised manuscript to increase the interpretability of our results.

      Major Comments:

      The authors demonstrate that the ultrastructural properties of presynaptic boutons are altered after Bcl11b KO and C1ql2 KD. However, whether C1ql2 functions as part of a tripartite complex and the identity of the postsynaptic receptor (BAI, KAR) should be examined.

      Matsuda and colleagues have nicely demonstrated in their 2016 (Neuron) study that C1ql2 is part of a tripartite complex with presynaptic Nrxn3 and postsynaptic KARs. Moreover, they demonstrated that C1ql2, together with C1ql3, recruit postsynaptic KARs at the MFS, while the KO of just C1ql2 did not affect the KAR localization. In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 is able to recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (Fig. 5; please also see our response to reviewer #3 point 4 above). Moreover, we were able to show that the SV recruitment depends on C1ql2 interaction with Nrxn3 through the expression of a non-binding C1ql2 (Fig. 6) that retains the ability to interact with GluK2 (supplementary figure 5k of revised manuscript) or by KO of Nrxns (Fig. 7). Furthermore, we have now performed a kinetic analysis on single-cell data which we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b cKO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling differ between controls and Bcl11b mutants (Author response image 3e-h; please also see our response to Reviewer #2 question 8). Together, we have no experimental evidence so far that would support that the postsynaptic partners of C1ql2 are involved in the observed phenotype. While it would be very interesting to characterize the postsynaptic partners of C1ql2 in depth, we feel this would be beyond the scope of the present study.

      Figure 1f: For a more comprehensive understanding of the Bcl11b KO phenotype and the potential role for C1ql2 on MF synapse number, a complete quantification of vGlut1 and Homer1 for all conditions (Supplement Figure 2e) should be included in the main text.

      In our study we focused on the role of C1ql2 in the structural and functional integrity of the MFS downstream of Bcl11b. Bcl11b ablation leads to several phenotypes in the MFS that have been thoroughly described in our previous study (De Bruyckere et al., 2018). As expected, re-expression of C1ql2 only partially rescued these phenotypes, with full recovery of the SV recruitment (Fig. 2) and of the LTP (Fig. 3), but had no effect on the reduced numbers of MFS nor the structural complexity of the MFB created by the Bcl11b KO (supplementary figure 2d-f of revised manuscript). We understand that including the quantification of vGlut1 and Homer1 co-localization in the main figures would help with a better understanding of the Bcl11b mutant phenotype. However, in our manuscript we investigate C1ql2 as an effector of Bcl11b and thus we focus on its functions in SV recruitment and LTP. As we did not find a link between C1ql2 and the number of MFS/MFB upon re-expression of C1ql2 in Bcl11b cKO or now also in C1ql2 KD (see response to comment #4 below), we believe it is more suitable to present these data in the supplement.

      Figure 3/4: Given the striking reduction in the numbers of synapses (Supplement Figure 2e) and docked vesicles (Figure 2d) in the Bcl11b KO and C1ql2 KD (Figure 4e-f), it is extremely surprising that basal synaptic transmission is unaffected (Supplement Figure 2g). The authors should determine the EPSP input-output relationship following C1ql2 KD and measure EPSPs following trains of stimuli at various high frequencies.

      We fully acknowledge that this is an unexpected result. It is, however, well feasible that the modest displacement of SV fails to noticeably influence basal synaptic transmission. This would be the case, for example, if only a low number of vesicles are released by single stimuli, in line with the very low initial Pr at MFS. In contrast, the reduction in synapse numbers in the Bcl11b mutant might indeed be expected to reflect in the input-output relationship. It is possible, however, that synapses that are preferentially eliminated in Bcl11b cKO are predominantly silent or have weak coupling strength, such that their loss has only a minimal effect on basal synaptic transmission. Finally, we cannot exclude compensatory mechanisms (homeostatic plasticity) at the remaining synapses. A detailed analysis of these potential mechanisms would be a whole project in its own right.

      As additional information, we can say that the largely unchanged input-output-relation in Bcl11b cKO is also present in the single-cell level data (Author response image 3; details on single-cell experiments are described in the response to Reviewer #2 point 8).

      As suggested by the reviewer, we have now additionally analyzed the input-output relationship following C1ql2 KD and again did not observe any significant difference between control and KD animals. We have incorporated the respective input-output curves into the revised manuscript under Supplementary figure 3c-d.

      Figure 4: Does C1ql2 shRNA also reduce the number of MFBs? This should be tested to further identify C1ql2-dependent and independent functions.

      As requested by reviewer #1 we quantified the number of MFBs upon C1ql2 KD. We show that C1ql2 KD in WT animals does not alter the number of MFBs. The data are presented in supplementary figure 4d of the revised manuscript. Re-expression of C1ql2 in Bcl11b cKO did not rescue the loss of MFS created by the Bcl11b mutation. Moreover, C1ql2 re-expression did not rescue the complexity of the MFB ultrastructure perturbed by the Bcl11b ablation. Together, this suggests that Bcl11b regulates MFs maintenance through additional C1ql2-independent pathways. In our previously published work (De Bruyckere et al., 2018) we identified and discussed in detail several candidate genes such as Sema5b, Ptgs2, Pdyn and Penk as putative effectors of Bcl11b in the structural and functional integrity of MFS (please also see response to reviewer #2- point 1 of public reviews).

      Figure 5: Clarification is required regarding the experimental design of the HEK/Neuron co-culture: 1. C1ql2 is a secreted soluble protein - how is the protein anchored to the HEK cell membrane to recruit Nrxn3(25b+) binding and, subsequently, vGlut1?

      C1ql2 was secreted by the HEK293 cells through an IgK signaling peptide at the N-terminus of C1ql2. The high concentration of C1ql2 close to the secretion site together with the sparse coculturing of the HEK293 cells on the neurons allows for the quantification of accumulation of neuronal proteins. We have now described the experimental conditions in greater detail in the main text module of the revised manuscript

      2) Why are the neurons transfected and not infected? Transfection efficiency of neurons with lipofectamine is usually poor (1-5%; Karra et al., 2010), while infection of neurons with lentiviruses or AAVs encoding cDNAs routinely are >90% efficient. Thus, interpretation of the recruitment assays may be influenced by the density of neurons transfected near a HEK cell.

      We agree with reviewer #1 that viral infection of the neurons would have been a more effective way of expressing our constructs. However, due to safety allowances in the used facility and time limitation at the time of conception of this set of experiments, a lipofectamine transfection was chosen.

      However, as all of our examined groups were handled in the same way and multiple cells from three independent experiments were examined for each experimental set, we believe that possible biases introduced by the transfection efficiency have been eliminated and thus have trust in our interpretation of these results.

      3) Surface labeling of HEK cells for wild-type C1ql2 and K262 C1ql2 would be helpful to assess the trafficking of the mutant.

      We recognize that potential changes to the trafficking of C1ql2 caused by the K262E mutation would be important to characterize, in light of the reduced localization of the mutant protein at the SL in the in vivo experiments (Fig. 6e). In our culture system, C1ql2 and K262E were secreted by the HEK cells through insertion of an IgK signaling peptide at the N-terminus of the myc-tagged C1ql2/K262E. Thus, trafficking analysis on this system would not be informative, as the system is highly artificial compared to the in vivo model. Further studies are needed to characterize C1ql2 trafficking in neurons to understand how C1ql2-Nrxn3 interaction regulates the localization of C1ql2. However, labeling of the myc-tag in C1ql2 or K262E expressing HEK cells of the co-culture model reveals a similar signal for the two proteins (Fig. 5a,c). Nrxn-null mutation in neurons co-cultured with C1ql2-expressing HEK cells disrupted C1ql2 mediated vGlut1 accumulation in the neurons. Selective expression of Nrxn3(25b) in the Nrxn-null neurons restored vGlut1 clustering was (Fig. 5e-f). Together, these data suggest that it is the interaction between C1ql2 and Nrxn3 that drives the accumulation of vGlut1.

      Figure 6: Bcl11b KO should also be included in 6f-h.

      As suggested by reviewer #1, we included the Bcl11b cKO in figures 6f-h and in corresponding supplementary figures 5c-j.

      Figure 7b: What is the abundance of mRNA for Nrxn1 and Nrxn2 as well as the abundance of Nrxns after EGFP-Cre injection into DG?

      We addressed this point raised by reviewer #1 by quantifying the relative mRNA levels of Nrxn1 and Nrxn2 via qPCR upon Nrxn123 mutation induction with EGFP-Cre injection. We have now examined the mRNA levels of Nrxn1 and Nrxn2 upon stereotaxic injection of Cre in the DG of Nrxn123flox/flox animals and found that Nrxn1 was only mildly reduced. At the same time Nrxn2 showed a tendency for reduction that was not significant. The data are presented in supplementary figure 6a of the revised maunscript.

      Minor Comments for readability:

      Synapse score is referred to frequently in the text and should be defined within the text for clarification.

      'n' numbers should be better defined in the figure legends. For example, for protein expression analysis in 1c, n=3. Is this a biological or technical triplicate? For electrophysiology (e.g. 3c), does "n=7" reflect the number of animals or the number of slices? n/N (slices/animals) should be presented.

      Figure 7a: Should the diagrams of the cre viruses be EGFP-Inactive or active Cre and not CRE-EGFP as shown in the diagram?

      Figure 7b: the region used for the inset should be identified in the larger image.

      All minor points have been fixed in the revised manuscript according to the suggestions.

      Reviewer #3 (Recommendations For The Authors):

      -Please describe the 'synapse score' somewhere in the text - it is too prominently featured to not have a clear description of what it is.

      The description of the synapse score has been included in the main text module of the revised manuscript.

      -The claim that Bcl11b controls SV recruitment "specifically" through C1ql2 is a bit stronger than is warranted by the data. Particularly given that C1ql2 is expressed at 2.5X control levels in their rescue experiments. See pt.2

      Please see response to reviewer #3 point 1 of public reviews. To address this, we over-expressed C1ql2 in control animals and found no changes in the synaptic vesicle distribution (supplementary figure 2g-j of revised manuscript). This supports that the observed rescue of synaptic vesicle recruitment by re-expression of C1ql2 is due to its physiological function and not due to the artificially elevated protein levels. Of course, we cannot exclude the possibility that other, C1ql2-independent, mechanisms also contribute to the SV recruitment downstream of Bcl11b. Our data from the C1ql2 rescue, C1ql2 KD, the in vitro experiments and the interruption of C1ql2-Nrxn3 in vivo, strongly suggest C1ql2 to be an important regulator of SV recruitment.

      -Does Bcl11b regulate Nrxn3 expression? Considering the apparent loss of C1ql2 expression in the Nrxn KO mice, this is an important detail.

      We agree with reviewer #3 that this is an important point. We have previously done differential transcriptomics from DG neurons of Bcl11b cKOs compared to controls and did not find Nrxn3 among the differentially expressed genes. To further validate this, we now quantified the Nrxn3 mRNA levels via qPCR in Bcl11b cKOs compared to controls and found no differences. These data are included in supplementary figure 5a of the revised manuscript.

      -It appears that C1ql2 expression is much lower in the Nrxn123 KO mice. Since the authors are trying to test whether Nrxn3 is required for the correct targeting of C1ql2, this is a confounding factor. We can't really tell if what we are seeing is a "mistargeting" of C1ql2, loss of expression, or both. If the authors did a similar analysis to what they did in Figure 1 where they looked at the synaptic localization of C1ql2 (and quantified it) that could provide more evidence to support or refute the "mistargeting" claim.

      Please also see response to reviewer #3 point 5 of public reviews. To exclude that reduction of fluorescence intensity of C1ql2 at the SL in Nrxn123 KO mice is due to loss of C1ql2 expression, we examined the mRNA levels of C1ql2 in control and Nrxn123 mutants and found no changes (data are included in supplementary figure 6b of the revised manuscript), suggesting that C1ql2 gene expression is normal. The reduced C1ql2 fluorescence intensity at the MFS was first observed when non-binding C1ql2 variant K262E was introduced to Bcl11b cKO mice that lack endogenous C1ql2 (Fig.6). In these experiments, we found that despite the overall high protein levels of C1ql2.K262E in the hippocampus (Fig. 6c), its fluorescence intensity at the SL was significantly reduced compared to WT C1ql2 (Fig. 6d-e). The remaining C1ql2.K262E signal in the SL was equally distributed and in a punctate form, similar to WT C1ql2. Together, this indicates that the loss of C1ql2-Nrxn3 interaction interferes with the localization of C1ql2 along the MFS, but not with expression of C1ql2. Of course, this does not exclude that additional mechanisms regulate C1ql2 localization at the synapse, as both the mutant C1ql2 in Bcl11b cKO and the endogenous C1ql2 in Nrxn123 cKO show residual immunofluorescence at the SL.

      We note here that we have not previously quantified the co-localization of C1ql2 with individual synapses. C1ql2 is a secreted molecule that localizes at the MFS synaptic cleft. However, not much is known about the number of MFS that are positive for C1ql2 nor about the mechanisms regulating C1ql2 targeting, transport, and secretion to the MFS. Whether C1ql2 interaction with Nrxn3 is necessary for the protection of C1ql2 from degradation, its surface presentation and transport or stabilization to the synapse is currently unclear. Upon revision of our manuscript, we realized that we might have overstated this particular finding and have now rephrased the specific parts within the results to appropriately describe the observation and have also included a sentence in the discussion referring to the lack of understanding of the mechanism behind this observation.

      -Title of Figure S5 is "Nrxn KO perturbs C1ql2 localization and SV recruitment at the MFS", but there is no data on C1ql2 localization.

      This issue has been fixed in the revised manusript.

      -S5 should be labeled more clearly than just Cre+/-

      This issue has been fixed in the revised manuscript.

      References

      Castillo, P.E., Malenka, R.C., Nicoll, R.A., 1997. Kainate receptors mediate a slow postsynaptic current in hippocampal CA3 neurons. Nature 388, 182–186. https://doi.org/10.1038/40645

      De Bruyckere, E., Simon, R., Nestel, S., Heimrich, B., Kätzel, D., Egorov, A.V., Liu, P., Jenkins, N.A., Copeland, N.G., Schwegler, H., Draguhn, A., Britsch, S., 2018. Stability and Function of Hippocampal Mossy Fiber Synapses Depend on Bcl11b/Ctip2. Front. Mol. Neurosci. 11. https://doi.org/10.3389/fnmol.2018.00103

      Kaeser, P.S., Regehr, W.G., 2017. The readily releasable pool of synaptic vesicles. Curr. Opin. Neurobiol. 43, 63–70. https://doi.org/10.1016/j.conb.2016.12.012

      Lerma, J., 2003. Roles and rules of kainate receptors in synaptic transmission. Nat. Rev. Neurosci. 4, 481–495. https://doi.org/10.1038/nrn1118

      Orlando, M., Dvorzhak, A., Bruentgens, F., Maglione, M., Rost, B.R., Sigrist, S.J., Breustedt, J., Schmitz, D., 2021. Recruitment of release sites underlies chemical presynaptic potentiation at hippocampal mossy fiber boutons. PLoS Biol. 19, e3001149. https://doi.org/10.1371/journal.pbio.3001149

      Vandael, D., Borges-Merjane, C., Zhang, X., Jonas, P., 2020. Short-Term Plasticity at Hippocampal Mossy Fiber Synapses Is Induced by Natural Activity Patterns and Associated with Vesicle Pool Engram Formation. Neuron 107, 509-521.e7. https://doi.org/10.1016/j.neuron.2020.05.013

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work describes new validated conditional double KO (cDKO) mice for LRRK1 and LRRK2 that will be useful for the field, given that LRRK2 is widely expressed in the brain and periphery, and many divergent phenotypes have been attributed previously to LRRK2 expression. The manuscript presents solid data demonstrating that it is the loss of LRRK1 and LRRK2 expression within the SNpc DA cells that is not well tolerated, as it was previously unclear from past work whether neurodegeneration in the LRRK double Knock Out (DKO) was cell autonomous or the result of loss of LRRK1/LRRK2 expression in other types of cells. Future studies may pursue the biochemical mechanisms underlying the reason for the apoptotic cells noted in this study, as here, the LRRK1/LRRK2 KO mice did not replicate the dramatic increase in the number of autophagic vacuoles previously noted in germline global LRRK1/LRRK2 KO mice.

      We thank the editors for handling our manuscript and for the succinct summary that recognizes the significance of our findings and points out interesting directions for future studies. We also thank the reviewers for their helpful comments and positive evaluation of our work. Below, we have provided point-by-point responses to the reviewers’ comments.

      Reviewer #1 (Public Review):

      Summary:

      This is an important work showing that loss of LRRK function causes late-onset dopaminergic neurodegeneration in a cell-autonomous manner. One of the LRRK members, LRRK2, is of significant translational importance as mutations in LRRK2 cause late-onset autosomal dominant Parkinson's disease (PD). While many in the field assume that LRRK2 mutant causes PD via increased LRRK2 activity (i.e., kinase activity), it is not a settled issue as not all disease-causing mutant LRRK2 exhibit increased activity. Further, while LRRK2 inhibitors are under clinical trials for PD, the consequence of chronic, long-term LRRK2 inhibition is unknown. Thus, studies evaluating the long-term impact of LRRK deficit have important translational implications. Moreover, because LRRK proteins, particularly LRRK2, are known to modulate immune response and intracellular membrane trafficking, the study's results and the reagents will be valuable for others interested in LRRK function.

      Strengths:

      This report describes a mouse model where the LRRK1 and LRRK2 gene is conditionally deleted in dopaminergic neurons. Previously, this group showed that while loss of LRRK2 expression does not cause brain phenotype, loss of both LRRK1 and LRRK2 causes a later onset, progressive degeneration of catecholaminergic neurons and dopaminergic (DAergic) neurons in the substantia nigra (SN), and noradrenergic neurons in the locus coeruleus (LC). However, because LRRK genes are widely expressed with some peripheral phenotypes, it was unknown if the neurodegeneration in the LRRK double knockout (DKO) was cell autonomous. To rigorously test this question, the authors have generated a double conditional (cDKO) allele where both LRRK1 and LRRK2 genes were targeted to contain loxP sites. In my view, this was beyond what is usually required, as most investigators might might combine one KO allele with another floxed allele. The authors provide a rigorous validation showing that the Driver (DAT-Cre) is expressed in most DAergic neurons in the SN and that LRRK levers are decreased selectively in the ventral midbrain. Using these mice, the authors show that the number of DAergic neurons is normal at 15 but significantly decreased at 20 months of age. Moreover, the authors show that the number of apoptotic neurons is increased by ~2X in aged SN, demonstrating increased ongoing cell death, as well as an increase in activated microglia. The degeneration is limited to DAergic neurons as LC neurons are not lost as this population does not express DAT. Overall, the mouse genetics and experimental analysis were performed rigorously, and the results were statistically sound and compelling.

      Weaknesses:

      I only have a few minor comments. First is that in PD and other degenerative conditions, loss of axons and terminals occurs prior to cell bodies. It might be beneficial to show the status of DAergic markers in the striatum. Second, previous studies indicate that very little, if any, LRRK1 is expressed in SN DAergic neurons. This also the case with the Allen Brain Atlas profile. Thus, authors should discuss the discrepancy as authors seem to imply significant LRRK1 expression in DA neurons.

      We appreciate the reviewer’s recognition of the importance of the study as well as our rigorous experimental approaches and compelling results. Our responses to the reviewer's two minor comments are below.

      1) DAergic markers in the striatum: We performed TH immunostaining in the striatum and quantified TH+ DA terminals in the striatum of DA neuron-specific LRRK cDKO and littermate control mice at the ages of 15 and 24 months. We found similar levels of TH immunoreactivity in the striatum of LRRK cDKO and littermate control mice at the age of 15 months (p = 0.6565, unpaired Student’s t-test) and significantly reduced levels of TH immunoreactivity in the striatum of LRRK cDKO, compared to control mice at the age of 24 months (~19%, p = 0.0215), suggesting an age-dependent loss of dopaminergic terminals in the striatum of DA neuron-specific LRRK cDKO mice. These results are now included as Figure 5 of the revised manuscript.

      2) LRRK1 expression in the SNpc: It is shown in the Mouse brain RNA-seq dataset and the Allen Mouse brain ISH dataset (https://www.proteinatlas.org/ENSG00000154237-LRRK1/brain) that LRRK1 is broadly expressed in the mouse brain and is expressed at modest levels in the midbrain, comparable to the cerebral cortex. Indeed, our Western analysis also showed that levels of LRRK1 detected in the dissected ventral midbrain and the cerebral cortex of control mice are similar (40µg total protein loaded per lane; Figure 2E). Furthermore, we previously demonstrated that deletion of LRRK2 (or LRRK1) alone does not cause age-dependent loss of DA neurons in the SNpc, but deletions of both LRRK1 and LRRK2 result in age-dependent loss of DA neurons in LRRK DKO mice, indicating the functional importance of LRRK1 in the protection of DA neuron survival in the aging mouse brain (Tong et al., PNAS 2010, 107: 9879-9884, Giaime et al., Neuron 2017, 96: 796-807).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shen and collaborators described the generation of cDKO mice lacking LRRK1 and LRRK2 selectively in DAT-positive DAergic neurons. The Authors asked whether selective deletion of both LRRK isoforms could lead to a Parkinsonian phenotype, as previously reported by the same group in germline double LRRK1 and LRRK2 knockout mice (PMID: 29056298). Indeed, cDKO mice developed a late reduction of TH+ neurons in SNpc that partially correlated with the reduction of NeuN+ cells. This was associated with increased apoptotic cell and microglial cell numbers in SNpc.

      Unlike the constitutive DKO mice described earlier, however, cDKO mice did not replicate the dramatic increase in the number of autophagic vacuoles. The study supports the authors' hypothesis that loss of function rather than gain of function of LRRK2 leads to PD.

      Strengths:

      The study described for the first time a model where both the PD-associated gene LRRK2 and its homolog LRRK1 are deleted selectively in DAergic neurons, offering a new tool to understand the physiopathological role of LRRK2 and the compensating role of LRRK1 in modulating DAergic cell function.

      Weaknesses:

      The model has no construct validity since loss of function mutations of LRRK2 are well-tolerated in humans and do not lead to PD. The evidence of a Parkinsonian phenotype in these cDKO mice is limited and should be considered preliminary.

      We thank the reviewer for commenting on the usefulness of this new PD mouse model.

      The reviewer did not include a reference citation for the statement "loss of function mutations of LRRK2 are well-tolerated in humans and do not lead to PD." It is possible that the reviewer was referring to a human population study (Whiffin et al., Nat Med 2020, 26: 869-877), entitled "The effect of LRRK2 lossof-function variants in humans." In this study, the authors analyzed 141,456 individuals sequenced in the Genome Aggregation Database, 49,960 exome-sequenced individuals from the UK Biobank, and more than 4 million participants in the 23andMe genotyped dataset, and they looked for human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants). The reported findings were interesting, and the authors were careful in stating their conclusions. However, this is not a linkage study of large pedigrees carrying a single, clear-cut loss-of-function mutation (e.g. large deletions of most exons and coding sequences). Therefore, the experimental evidence is not compelling enough to conclude whether loss-of-function mutations in LRRK2 cause PD or do not cause PD.

      The current report is an unbiased genetic study in an effort to reveal the normal physiological role of LRRK in dopaminergic neurons. It was not intended to produce Parkinsonian phenotypes in LRRK cDKO mice, which would be a biased effort. However, the unequivocal discovery of the cell intrinsic role of LRRK in the protection of DA neurons from age-dependent degeneration and apoptotic cell death should be considered seriously, while we contemplate the disease mechanism and how LRRK2 mutations may cause DA neuron loss and PD.

      Reviewer #3 (Public Review):

      Kang, Huang, and colleagues investigated the impact of LRRK1 and LRRK2 deletion, specifically in dopaminergic neurons, using a novel cDKO mouse model. They observed a significant reduction in DAergic neurons in the substantia nigra in their conditional LRRK1 and LRRK2 KO mice and a corresponding increase in markers of apoptosis and gliosis. This work set out to address a longstanding question within the field around the role and importance of LRRK1 and LRRK2 in DAergic neurons and suggests that the loss of both proteins triggers some neurodegeneration and glial activation.

      The studies included in this work are carefully performed and clearly communicated, but additional studies are needed to strengthen further the authors' claims around the consequences of LRRK2 deletion in DAergic neurons.

      1) In Figures 2E and F, the authors assess the protein levels of LRRK1 and LRRK2 in their cDKO mouse model to confirm the deletion of both proteins. They observe a mild loss of LRRK1 and LRRK2 signals in the ventral midbrain compared to wild-type animals. While this is not surprising given other cell types that still express LRRK1 and LRRK2 would be present in their dissected ventral midbrain samples, it does not sufficiently confirm that LRRK1 and LRRK2 are not expressed in DAergic neurons. Additional data is needed to more directly demonstrate that LRRK1 and LRRK2 protein levels are reduced in DAergic neurons, including analysis of LRRK1 and LRRK2 protein levels via immunohistochemistry or FACS-based analysis of TH+ neurons.

      We thank the reviewer for highlighting this incredibly important but often overlooked issue. We agree that the data in Figure 2E, F alone would be inadequate to validate DA neuron-specific LRRK cDKO mice.

      Cell type-specific conditional knockouts are a mosaic with KO cells mixed with other cell types expressing the gene normally. DA neuron-specific cDKO is particularly challenging, as DA neurons are a subset of cells embedded in the ventral midbrain. Rather than using immunostaining, which relies upon specific, good LRRK1 and LRRK2 antibodies for IHC, or FACS sorting of TH+ neurons followed by Western blotting (few cells, mixed cell populations, etc.), we chose a clean genetic approach by generating germline mutant mice carrying the deleted LRRK1 and LRRK2 alleles in all cells from the floxed LRRK1 and LRRK2 alleles. This approach permits characterization of these deletion mutations in germline mutant mice using molecular approaches that yield unambiguous results.

      We crossed CMV-Cre deleter mice with floxed LRRK1 and LRRK2 mice to generate respective germline LRRK1 KO and LRRK2 KO mice, in which all cells carry the LRRK1 or LRRK2 deleted alleles that are identical to those in DA neurons of cDKO mice. We then performed Northern, extensive RTPCR followed by sequencing, and Western analyses to show the absence of the full length LRRK1 and LRRK2 mRNA (Figure 1G, H, Figure 1-figure supplement 8 and 10), and the expected truncation of LRRK1 and LRRK2 mRNA (Figure 1-figure supplement 9 and 11), and the absence of LRRK1 and LRRK2 proteins (Figure 1I). These analyses together demonstrate that in the presence of Cre, either CMV-Cre expressed in all cells or DAT-Cre expressed selectively in DA neurons, the floxed LRRK1 and LRRK2 exons are deleted, resulting in null alleles. We further demonstrated the specificity of DAT-Cremediated recombination (deletion) by crossing DAT-Cre mice with a GFP reporter, showing that 99% TH+ DA neurons in the SNpc are also GFP+ (Figure 2A, B), indicating that DAT-Cre-mediated recombination of the floxed alleles occurs in essentially all TH+ DA neurons in the SNpc.

      2) The authors observed a significant but modest effect of LRRK1 and LRRK2 deletion on the number of TH+ neurons in the substantia nigra (12-15% loss at 20-24 months of age). It is unclear whether this extent of neuron loss is functionally relevant. To strengthen the impact of these data, additional studies are warranted to determine whether this translates into any PD-relevant deficits in the mice, including motor deficits or alterations in alpha-synuclein accumulation/aggregation.

      Yes, the reduction of DA neurons in the SNpc of cDKO mice at the age of 20-24 months is modest. At 15 months of age, the number of TH+ DA neurons in the SNpc is similar between LRRK cDKO mice (10,000 ± 141) and littermate controls (10,077 ± 310, p > 0.9999). At 20 months of age, the number of DA neurons in the SNpc of LRRK cDKO mice (8,948 ± 273) is significantly reduced (-12.7%), compared to control mice (10,244 ± 220, F1,46 = 16.59, p = 0.0002, two-way ANOVA with Bonferroni’s post hoc multiple comparisons, p = 0.0041). By 24 months of age, the number of DA neurons in the SNpc of LRRK cDKO mice (8,188 ± 452) relative to controls (9,675 ± 232, p = 0.0010) is further reduced (15.4%).

      Similar results were obtained by an independent quantification by another investigator, also conducted in a genotype blind manner, using the fractionator and optical dissector method, by which TH+ cells were quantified in 25% areas. These results are included as Figure 3-figure supplement 1 in the revised manuscript. Because of the more limited sampling, the quantification data are more variable, compared to quantification of TH+ cells in all areas of the SNpc, shown in Figure 3. With both methods, we quantified TH+ cells in every 10th sections encompassing the entire SNpc (3D structure), as sampling using every 5th or every 10th sections yielded similar results.

      We also performed behavioral analysis of LRRK cDKO mice and littermate controls at the ages of 10 and 25 months using the beam walk test (10 mm and 20 mm beam) and the pole test, which are sensitive to impairment of motor coordination. We found that LRRK cDKO mice at 10 months of age showed significantly more hindlimb errors (p = 0.0005, unpaired two-tailed Student’s t-test) and longer traversal time (p = 0.0075) in the 10mm beam walk test, compared to control mice, though their performance is similar in the 20 mm beam walk (hindlimb slips: p = 0.0733, traversal time: p = 0.9796) and in the pole test. At 22 months of age, the performance of LRRK cDKO mice and littermate controls is more variable and worse, compared to the younger mice, and is not significantly different between the genotypic groups. These results are now included as Figure 9 of the revised manuscript.

      3) The authors demonstrate that, unlike in the germline LRRK DKO mice, they do not observe any alterations in electron-dense vacuoles via EM. Given their data showing increased apoptosis and gliosis, it remains unclear how the loss of LRRK proteins leads to DAergic neuronal cell loss. Mechanistic studies would be insightful to understand better potential explanations for how the loss of LRRK1 and LRRK2 may impair cellular survival, and additional text should be added to the discussion to discuss potential hypotheses for how this might occur.

      We agree that this phenotypic difference between germline DKO and DA neuron-specific cDKO mice is intriguing, suggesting a non-cell autonomous contribution of LRRK in age-dependent accumulation of autophagic and lysosomal vacuoles in SNpc neurons of germline LRRK DKO mice. We will discuss the phenotypic difference further in the revised manuscript. We are generating microglial specific LRRK cDKO mice to investigate the role of LRRK in microglia and whether microglia contribute in a cell extrinsic manner to the regulation of the autophagy-lysosomal pathway in DA neurons.

      4) The authors discuss the potential implications of the neuronal cell loss observed in cDKO mice for LRRK1 and LRRK2 for therapeutic approaches targeting LRRK2 and suggest this argues that LRRK2 variants may exert their effects through a loss-of-protein function. However, all of the data generated in this work focus on a mouse in which both LRRK1 and LRRK2 have been deleted, and it is therefore difficult to make any definitive conclusions about the consequences of specifically targeting LRRK2. The authors note potential redundancy between the two LRRK proteins, and they should soften some of their conclusions in the discussion section around implications for the effects of LRRK2 variants. Human subjects that carry LRRK2 loss-of-function alleles do not have an increased risk for developing PD, which argues against the author's conclusions that LRRK2 variants associated with PD are loss-o-ffunction. Additional text should be included in their discussion to better address these nuances and caution should be used in terms of extrapolating their data to effects observed with PD-linked variants in LRRK2.

      We will modify the discussion accordingly in the revised manuscript.

    1. Author Response

      eLife assessment

      This valuable paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. While the examples of scientific results to be derived with this method are in the preliminary stages, they offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents.

      Authors’ Response: We thank the reviewers for the helpful and constructive comments. They have helped us plan for significant improvements to our manuscript. Our preliminary response and plans for revision are indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside this, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      -Comprehensive methodological detailing:

      The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols, and a list of materials in the supplementary materials.

      -Minimal movement artifacts:

      A notable strength of this study is the remarkably low movement artifacts. To further underscore this achievement, a more robust quantification across all subjects, coupled with benchmarking against established tools (such as those from suite2p), would be beneficial.

      Authors’ Response: This is a good suggestion. Since we used suite2p for our data analysis, and have records of the fast-z correction applied by the microscope, we can supply these as quantifications of movement corrections that were applied across our sample of mice. We hope to supply this information as a supplement in the revised manuscript.

      Currently, we have chosen to show that the corrected, post- suite2p registration movement artifacts are very close to zero. We will revise the manuscript with clear descriptions of methods that we have found important, such as fully tightening all mounting devices, utilizing the air table properly, implanting the cranial window with proper, even pressure across its entire extent, and mounting the mouse so that it is not too close or far from the surface of the running wheel.

      Insightful preliminary data and analysis:

      The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      -Clarification about the extent of the method in the title and text:

      The title of the paper, using the term "pan-cortical," along with certain phrases in the text, may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice. To avoid confusion, it should be explicitly stated that the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex). For instance, in line 545, the phrase "lateral cortex with our dorsal and side mount preparations" should be revised to "lateral cortex with our dorsal or side mount preparations" for greater clarity.

      Authors’ Response: We will revise the manuscript so that it is clear that we made use of two imaging configurations for the 2-photon mesoscope data and the benefits and limitations of these two preparations. The dorsal mount and the side mount each have their advantages and disadvantages, but together form a powerful tool for imaging much of the dorsal and lateral cortex in awake, behaving mice.

      -Comparison with existing methods:

      A more detailed contrast between this method and other published techniques would add value to the paper. Specifically, the lateral view appears somewhat narrower than that described in Esmaeili et al., 2021; a discussion of this comparison would be useful.

      Authors’ Response: We will modify the manuscript so that a more detailed comparison with other published techniques is included. The preparation by Esmaeili et al. 2021 has some similarities, but also differences, from our preparation. Our preliminary reading is that their through-the-skull field of view is approximately the same as our through-the-skull field of view that exists between our first (headpost implantation) and second (window implantation) surgeries, although our preparation appears to include more anterior areas both near to and on the contralateral side of the midline. We will compare these preparations more accurately in the revised manuscript.

      If you compare the imageable extent of our cranial window for mesoscale 2-photon imaging to that of their through-the-skull widefield preparation, which is a bit of an “apples to oranges” comparison, then you are likely correct that their field of view is larger than ours, if you are referring to our 10 mm radius-bend glass. However, use of our 9 mm radius bend glass (i.e. a tighter bend) allows us to image additional ventral auditory areas. We could show an example of this, perhaps, although we did not make as much use of this alternative window in the large FOV experiments, because the increased curvature of the glass relative to the 10 mm radius bend window prevents imaging of the entire preparation in a single 2-photon z-plane. With the 9 mm radius bend glass we mostly imaged in the multiple, small FOV configuration (see Fig. S2).

      Furthermore, the number of neurons analyzed seems modest compared to recent papers (50k) - elaborating on this aspect could provide important context for the readers.

      Authors’ response: With respect to the “modest” number of neurons analyzed (between 2000 and 8000 neurons per session for our dorsal and side mount preparations with medians near 4500; See Fig. S2e) we would like to point out that factors such as use of dual-plane imaging or multiple imaging planes, different mouse lines, use of different duration recording sessions (see our Fig S2c), use of different imaging speeds and resolutions (see our Fig S2d), use of different Suite2p run-time parameters, and inclusion or areas with blood vessels and different neuron cell densities, may all impact the count of total analyzed neurons. We could provide additional documentation of these issues, but we would like to point out that, in our case, we were not trying to maximize neuron count at the expense of other factors such as imaging speed and total spatial FOV extent.

      -Discussion of methodological limitations:

      The limitations inherent to the method, such as the potential behavioral effects of tilting the mouse's head, are not thoroughly examined. A more comprehensive discussion of these limitations would enhance the paper's balance and depth.

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this situation (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, will be discussed more thoroughly in the revised manuscript.

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparison across mice indicates that side and dorsal mount mice show similar behavioral variability.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective to 20 degrees to the right, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      -Preliminary nature of results:

      The results are at a preliminary stage; for example, the B-soid analysis is based on a single mouse, and the validation data are derived from the training data set. The discrepancy between the maps in Figures 5e and 6e might indicate that a significant portion of the map represents noise. An analysis of variability across mice and a method to assign significance to these maps would be beneficial.

      Authors’ Response: In this methods paper, we have chosen to supply proof of principle examples, without a complete analysis of animal-to-animal variance. The dataset for this paper contains both neural and behavioral data for 91 sessions across 18 mice from both dorsal and side mount preparations. The complete analysis of this dataset exceeds the capacity of the present study. We will include more individual examples in the revised version, along with data showing the amount of between session and across mouse variance. We will include in the revised manuscript a comparison of the stability of B-SOiD measures across sessions, as a demonstration of what may be expected with this method.

      -Analysis details:

      More comprehensive details on the analysis would be beneficial for replicability and deeper understanding. For instance, the statement "Rigid and non-rigid motion correction were performed in Suite2p" could be expanded with a brief explanation of the underlying principles, such as phase correlation, to provide readers with a better grasp of the methodologies employed.

      Authors’ Response: We are revising the manuscript to give more detail without reducing readability, so as to increase clarity of presentation. Since this is a methods paper, we are modifying the manuscript to include more details and clear explanations so that the reader may replicate our methods and results.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      Authors’ Response: Thank you for this excellent summary of the scope of our paper.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Authors’ Response: We very much support the idea that our work here will contribute to the development of standard workflows across the field including multiple approaches to large-scale neural recordings.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various problems, such as motion artifacts or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for the imaging of neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field.

      Weaknesses:

      Most of the descriptions are quite focused on a specific acquisition system, the Thorlabs Mesoscope, and the manuscript is in part highly technical making it harder to understand the motivation and reasoning behind some of the proposed implementations. A revised version would benefit from a more general description of common problems and the thought process behind the proposed solutions to broaden the impact of the work and make it more accessible for labs that do not have access to a Thorlabs mesoscope. A better introduction of some of the specific issues would also promote the development of other solutions in labs that are just starting to use similar tools.

      Authors’ Response: We will re-write the motivation behind the study to clarify the general problems that are being addressed. As the 2-photon imaging component of these experiments were performed on a Thorlabs mesoscope, the imaging details will necessarily deal specifically with this system. We will briefly compare the methods and results from our Thorlabs system to that of other systems, based on what we are able to glean from the literature on their strengths and weaknesses.

      Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication to providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high, but it is not clear whether this is available to the community, some datasets should be deposited. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high-level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large-scale neural ensembles with behaviour. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      Authors’ Response: Thank you for the kind words.

      We intend to make the data set used in making our main figures available to the public, perhaps using FigShare, so that they may check the validity of the methods and analysis. We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with a second in-depth analysis paper that is currently in preparation.

      This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other. This is described in the methods, but a visual representation would greatly benefit the readers looking to implement something similar.

      Authors’ Response: This is an excellent suggestion. We will include a workflow diagram in the revised manuscript for the methods, data collection, and analysis.

      The authors should cite sources for the claims stated in lines 449-453 and cite the claim of the mouse's hearing threshold mentioned in lines 463.

      Authors’ Response: For the claim stated in lines 449-453, “The unattenuated or native high-frequency background noise generated by the resonant scanner causes stress to both mice and experimenters, and can prevent mice from achieving maximum performance in auditory mapping, spontaneous activity sessions, auditory stimulus detection, and auditory discrimination sessions/tasks,” we can provide the following references: (i) for mice: Sadananda et al, 2008 (“Playback of 22-kHz and 50-kHz ultrasonic vocalizations induces differential c-fos expression in rat brain”, Neuroscience Letters, Vol 435, Issue 1, p 17-23), and (ii) for humans: Fletcher et al, 2018 (“Effects of very high-frequency sound and ultrasound on humans. Part I: Adverse symptoms after exposure to audible very-high frequency sound”, J Acoust Soc A, 144, 2511-2520). We will include these references in the revised paper.

      For line 463, “i.e. below the mouse hearing threshold at 12.5 kHz of roughly 15 dB”, we can provide the following reference: Zheng et al, 1999 (“Assessment of hearing in 80 inbred strains of mice by ABR threshold analyses”, Vol 130, Issues 1-2, p 94-107). We will also include this reference in the paper. Thank you for identifying these citation omissions.

      No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors.

      Authors’ Response: There are two statistical comparisons that we feel may be useful to add to the single session data displayed in this figure, in order to address the point that you raise. The first would allow us to assess whether for each Rastermap group, the distribution of neuron densities across CCF areas differs from a null, uniform distribution. The second would allow us to examine differences between Rastermap groups associated with different qualitative behaviors in order to know with which patterns of neural activity they are reliably associated.

      For the first comparison, we could provide a statistic similar to what we provide for Fig. S6c and f, in which for each CCF area we compare the observed mean correlation values to a null of 0, or, in this case, the population densities of each Rastermap group for each CCF area to a null value equal to the total number of CCF areas divided by the total number of recorded neurons for that group (i.e. a Rastermap group with 500 neurons evenly distributed across ~30 CCF areas would contain ~17 neurons (or ~6% density) per CCF area.) Our current figure legend states that the maximum of the scale bar look-up value (reds) for each group ranges from ~8% to 32%. So indeed, adding these significances would be informative in this case.

      For the second comparison, we could compare the density of neurons for each CCF area across Rastermap groups for this session. For example, it may be the case that the density of neurons in primary and secondary visual areas belonging to Rastermap groups that predominate during the “walk” behavior is higher than in the Rastermap group that predominates during the “whisk” behavior, or that the density of neurons in the “whisk” and “twitch” Rastermap groups in primary and secondary motor areas is higher than in the Rastermap groups that are active during the “walk” and “oscillate” behaviors.

      Such a comparison should in fact be robust to Rastermap group variability across sessions and mice, as long as the same qualitative behaviors recur. However, our current qualitative methods for discretization of the Rastermap groups likely limits our ability to extend such an analysis accurately across our entire dataset. We are pursuing more rigorous analysis methods in this vein for our second, results oriented paper.

      While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in lines 178-179, the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what is known from the literature.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the current methods paper. We are following up this methods paper with an in depth analysis of neural activity and corresponding behavior across the cortex during spontaneous and trained behaviors, but this analysis goes well beyond the scope of the present manuscript. Here, we prefer to present examples of the types of results that can be expected to be obtained using our methods, and how these results compare with those obtained by others in the field.

      Specific strengths and weaknesses with areas to improve:

      The paper should include an overall cartoon diagram that indicates how the various modules are linked together for the sampling of both behaviour and mesoscale GCAMP. This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other.

      Authors’ Response: This is an excellent suggestion and will be included in the revised manuscript, so that readers can more readily follow our workflow, data collection, and analysis.

      The paper contains many important results regarding correlations between behaviour and activity motifs on both the cellular and regional scales. There is a lot of data and it is difficult to draw out new concepts. It might be useful for readers to have an overall figure discussing various results and how they are linked to pupil movement and brain activity. A simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc) may help in this regard.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the present methods paper. Such an analysis is a significant undertaking with such large and heterogeneous datasets, and we provide proof-of-principle data here so that the reader can understand the type of data to be expected using our methods. We hope to provide a more complete analysis of data obtained using our methodology in the near future in a second manuscript.

      However, we may be amenable to including preliminary linear model fit results, as supplementary material, for the two example sessions highlighted in this paper (i.e. the one dorsal mount session in Fig. 4, and the one side mount session shown in Figs. 5 and 6).

      Previously, widefield imaging methods have been employed to describe regional activity motifs that correlate with known intracortical projections. Within the authors' data it would be interesting to perhaps describe how these two different methods are interrelated -they do collect both datasets. Surprisingly, such macroscale patterns are not immediately obvious from the authors' data. Some of this may be related to the scaling of correlation patterns or other factors. Perhaps there still isn't enough data to readily see these and it is too sparse.

      Authors’ Response: Unfortunately, we are unable to directly compare widefield GCaMP6s activity with mesoscope 2-photon GCaMP6s activity. During widefield data acquisition, animals were stimulated with visual, auditory, or somatosensory stimuli, while 2-photon mesoscope data collection occurred during spontaneous changes in behavioral state, without sensory stimulation. The suggested comparison is, indeed, an interesting project for the future.

      In lines 71-71, the authors described some disadvantages of one-photon widefield imaging including the inability to achieve single-cell resolution. However, this is not true. In recent years, the combination of better surgical preparations, camera sensors, and genetically encoded calcium indicators has enabled the acquisition of single-cell data even using one-photon widefield imaging methods. These methods include miniscopes (Cai et al., 2016), multi-camera arrays (Hope et al., 2023), and spinning disks (Xie et al., 2023).

      Cai, Denise J., et al. "A shared neural ensemble links distinct contextual memories encoded close in time." Nature 534.7605 (2016): 115-118.

      Hope, James, et al. "Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton." bioRxiv (2023).

      Xie, Hao, et al. "Multifocal fluorescence video-rate imaging of centimetre-wide arbitrarily shaped brain surfaces at micrometric resolution." Nature Biomedical Engineering (2023): 1-14.

      Authors’ Response: We will correct these statements and incorporate these, and other relevant, references. There are advantages and disadvantages to each chosen technique, such as ease of use, field of view, accuracy, speed, etc., and we will highlight a few of these without an extensive literature review.

      Even the best one-photon imaging techniques typically have ~10-20 micrometer resolution in xy (we image at 5 micrometer resolution for our large FOV configuration, but the xy point-spread function for the Thorlabs mesoscope is 0.61 x 0.61 micrometers in xy with 970 nm excitation) and undefined z-resolution (4.25 micrometers for Thorlabs mesoscope). A coarser resolution increases the likelihood that activity data from neighboring cells may contaminate the fluorescence observed from imaged neurons. Reducing the FOV and using sparse expression of the indicator lessens this overlap problem.

      We do appreciate these recent advances, however, particularly for use in cases where more rapid imaging is desired over a large field of view (CCD acquisition can be much faster than that of standard 2-photon galvo-galvo or even galvo-resonant scanning, as the Thorlabs mesoscope uses). This being said, there are few currently available genetically encoded Ca2+ sensors that are able to measure fluctuations faster than ~10 Hz, which is a speed achievable on the Thorlabs 2-photon mesoscope with our techniques using the “small, multiple FOV” method (Fig. S2d, e).

      The authors' claim of achieving optical clarity for up to 150 days post-surgery with their modified crystal skull approach is significantly longer than the 8 weeks (approximately 56 days) reported in the original study by Kim et al. (2016). Since surgical preparations are an integral part of the manuscript, it may be helpful to provide more details to address the feasibility and reliability of the preparation in chronic studies. A series of images documenting the progression optical quality of the window would offer valuable insight.

      Authors’ Response: As you suggest, we will include images and data demonstrating the average changes in the window preparation, as well as the degree of variability and a range of outcome scenarios that we observed over the prolonged time periods of our study. We will also include methodological details that we found were useful for facilitating long term use of these preparations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper performed a functional analysis of the poorly characterized pseudo-phosphatase Styxl2, one of the targets of the Jak/Stat pathway in muscle cells. The authors propose that Styxl2 is essential for de novo sarcomere assembly by regulating autophagic degradation of non-muscle myosin IIs (NM IIs). Although a previous study by Fero et al. (2014) has already reported that Styxl2 is essential for the integrity of sarcomeres, this study provides new mechanistic insights into the phenomenon. In vivo studies in this manuscript are compelling; however, I feel the contribution of autophagy in the degradation of NM IIs is still unclear.

      Major concerns:

      1) The contribution of autophagy in the degradation of Myh9 is still unclear to this reviewer.

      It has been reported that autophagy is dispensable for sarcomere assembly in mice (Cell Metab, 2009, PMID; 1994508). In Fig. 7A, the authors showed that overexpressed Styxl2 downregulated the amount of ectopically expressed Myh9 in an ATG5-dependent manner in C2C12 cells; however, the experiment is far from a physiological condition. Therefore, the authors should test ATG5 knockdown and the genetic interaction between Styxl2 and ATG5 in vivo. That is, 1) loss of ATG5 on sarcomere assembly in zebrafish, and 2) the genetic interaction between Styxl2 and ATG5; co-injection of Styxl2 mRNA and ATG5-MO into the zebrafish embryos.

      Our response: In fact, the reference cited by the reviewer (Cell Metab, 2009; PMID; 19945408) clearly indicated that autophagy is required for sarcomere assembly. Moreover, another paper using the fish extraocular muscle regeneration model (Autophagy, 2014, PMID: 27467399), also showed that the sarcomere structure was disrupted in the regenerated muscles when autophagy was inhibited by chloroquine. In addition, other references (Nature medicine, 2007, PMID: 17450150; Autophagy, 2010, PMID: 20431347) also showed that loss of Atg5 in mouse cardiac muscles led to disorganized sarcomere structure. We also performed the Atg5 knockdown experiments as suggested by the reviewer. However, the sarcomere structure defects were not so obvious as Styxl2 knockdown (see Author response image 1 below). In fact, it was reported that Atg5 knockdown may not be a desirable strategy to disrupt autophagy as it was found “--- only a small amount of Atg5 is needed for autophagy, knockdown of Atg5 to levels low enough to block autophagy might be difficult to achieve, --” (Nature medicine, 2007, PMID: 17450150). Due to the ineffectiveness of the Atg5 MO in our assays, we did not perform the second experiment suggested by the reviewer. Moreover, as Styxl2 is not a key component of the autophagy machinery, it is less likely that overexpression of Styxl2 alone can rescue the autophagy defects caused by Atg5.

      Author response image 1.

      The fish zygotes were injected with Atg5 or Ctrl MO. 48 hpf, the fish were stained with an anti-Actinin antibody. Some fast muscle fibers were disrupted when Atg5 was knocked down. The number in numerator at the bottom of each image represents fish embryos showing normal Actinin staining pattern, while that in denominator represents the total number of embryos examined. Scale bar, 10 µm.

      2) As referenced, Yamamoto et al. reported that Myh9 is degraded by autophagy. Mechanistically, Nek9 acts as an autophagic adaptor that bridges Atg8 and Myh9 through interactions with both. Inconsistent with the model, the authors mentioned on page 12, lines 365-367, "A recent report showed that Myh9 could also undergo Nek9-mediated selective autophagy (Yamamoto et al., 2021), suggesting that Myh9 is ubiquitinated". I think it is not yet explored whether autophagic degradation of Myh9 requires its ubiquitination. Moreover, I cannot judge whether Myh9 is ubiquitinated in a Styxl2-dependent manner from the data in Fig. 7C. The author should test whether Nek9 is required for Myh9 degradation in muscles. If Nek plays a role in the Myh9 degradation, it would be better to remove Fig. 7C.

      Our response: Indeed, as pointed out by the reviewer, it has not been explored whether Myh9 is ubiquitinated or not. However, it has been well-established that some proteins undergoing autophagic degradation are ubiquitinated, which are linked to Atg8/LC3 via p62 and NBR1 (Mol Cell, 2009, PMID: 19250911; J Biol Chem, 2007, PMID: 17580304). To improve the data quality, we repeated the Myh9 ubiquitination experiment in cells with or without Styxl2 by using a slightly different strategy: as shown in the revised Figure 7C, we first co-transfect HEK 293T cells with HA-Myh9, Myc-ubiquitin, and Flag-Styxl2. We then immunoprecipitated Myc-tagged Ubiquitin from the whole cell lysates, and then blot for HAMyh9. We detected an obvious increase in Ubiquitin-conjugated HA-Myh9 (revised Figure 7C). As suggested by the reviewer, we also tested whether knockdown of Nek9 affects the degradation of Myh9. We failed to detect an obvious effect (see Author response image 2 below) caused by Nek9 knockdown. One possible explanation for this negative result is that Nek9 itself is a negative regulator of selective autophagy (J Biol Chem, 2020, PMID: 31857374). By knocking it down, the functions of the autophagy machinery are expected to be enhanced instead of being impaired. This may explain why we failed to detect an effect on Myh9 degradation simply by knocking down Nek9. To further elucidate whether Nek9 is involved in Myh9 degradation in myoblasts, we may need to use a dominant-negative mutant of Nek9 missing the LCIII-binding motif as shown by Yamamoto (Nat Commun, 2021, PMID: 34078910). This will be addressed in our future study.

      Author response image 2.

      C2C12 cells were transfected with negative control siRNA (NC), siNek9#2 or siNek9#3. 18 h later, the cells were transfected with plasmids HA-Myh9 and Flag-Styxl2 or Flag-Stk24. After another 24 h, the cells were harvested for RT-qPCR (left panel) or western blot (right panel).

      3) In Fig. 5F, the protein level of Styxl2 and Myh10 should be checked because the efficiency of Myh10-MO was not shown anywhere in this manuscript.

      Our response: As suggested by the reviewer, a Western blot showing the protein levels of Myh10 was shown in Figure 5-figure supplement 1B.

      Reviewer #2 (Public Review):

      The authors investigated the role of the Jak1-Stat1 signaling pathway in myogenic differentiation by screening the transcriptional targets of Jak1-Stat1 and identified Styxl2, a pseudophosphatase, as one of them. Styxl2 expression was induced in differentiating muscles. The authors used a zebrafish knockdown model and conditional knockout mouse models to show that Styxl2 is required for de novo sarcomere assembly but is dispensable for the maintenance of existing sarcomeres. Styxl2 interacts with the non-muscle myosin IIs, Myh9 and Myh10, and promotes the replacement of these non-muscle myosin IIs by muscle myosin IIs through inducing autophagic degradation of Myh9 and Myh10. This function is independent of its phosphatase domain.

      A previous study using zebrafish found that Styxl2 (previously known as DUSP27) is expressed during embryonic muscle development and is crucial for sarcomere assembly, but its mechanism remains unknown. This paper provides important information on how Styxl2 mediates the replacement of non-muscle myosin with muscle myosin during differentiation. This study may also explain why autophagy deficiency in muscles and the heart causes sarcomere assembly defects in previous mouse models.

      Reviewer #3 (Public Review):

      Wu and colleagues are characterising the function of Styxl2 during muscle development, a pseudo-phosphatase that was already described to have some function in sarcomere morphogenesis or maintenance (Fero et al. 2014). The authors verify a role for Styxl2 in sarcomere assembly/maintenance using zebrafish embryonic muscles by morpholino knockdown and by a conditional Styxl2 allele in mice (knocked-out in satellite cells with Pax7 Cre).

      Experiments using a tamoxifen inducible Cre suggest that Styxl2 is dispensable for sarcomere maintenance and only needed for sarcomere assembly.

      BioID experiments with Styxl2 in C2C 12 myoblasts suggest binding of nonmuscle myosins (NMs) to Styxl2. Interestingly, both NMs are downregulated when muscles differentiate after birth or during regeneration in mice. This down-regulation is reduced in the Styxl2 mutant mice, suggesting that Styxl2 is required for the degradation of these NMs.

      Impressively, reducing one NM (zMyh10) by double morpholino injection in a Styxl2 morphant zebrafish, does improve zebrafish mobility and sarcomere structure. Degradation of Mhy9 is also stimulated in cell culture if Styxl2 is co-expressed. Surprisingly, the phosphatase domain is not needed for these degradation and sarcomere structure rescue effects. Inhibitor experiments suggest that Styxl2 does promote the degradation of NMs by promoting the selective autophagy pathway.

      Strengths:

      A major strength of the paper is the combination of various systems, mouse and fish muscles in vivo to test Styxl2 function, and cell culture including a C2C12 muscle cell line to assay protein binding or protein degradation as well as inhibitor studies that can suggest biochemical pathways.

      Weakness:

      The weakness of this manuscript is that the sarcomere phenotypes and also the western blots are not quantified. Hence, we rely on judging the results from a single image or blot. Also, Styxl2 role in sarcomere biology was not entirely novel.

      Few high resolution sarcomere images are shown, myosins have not been stained for.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      4) The position of molecular weight markers should be shown in all Western blot data.

      Our response: As suggested by the reviewer, the molecular weight markers have been added in the Western blot data.

      5) Schematic models of Styxl2deltaN509 and N513 construct would be helpful for the readers.

      Our response: A schematic has been added in Figure 6B (upper panel) to show Styxl2deltaN509 and Styxl2N513.

      6) Several data were described but not shown (data not shown). I think the data need to be included in the main or supplemental figures.

      Our response: As suggested by the reviewer, the raw data were now included in the Figure 6-figure supplement 1A and Figure 7-figure supplement 1.

      Reviewer #2 (Recommendations For The Authors):

      1) In Fig. 5E, the authors suggest that the needle touch response was improved by additional knockdown of Myh10. This is a bit confusing because the germline knockout of Myh10 is lethal (line 445). The authors should provide more explanation on this point. Additionally, it would be better to include Myh10-MO in Fig. 5E.

      Our response:<br /> In line 445 of our original manuscript, we stated that germline knockout of mouse Myh10 gene is lethal based on a published report (Proc Natl Acad Sci USA, 1997, PMID: 9356462). Here, in zebrafish zygotes, we only knocked down zMyh10, thus, we do not expect to get a lethal phenotype. In addition, other groups who knocked down Myh10 in fish also did not get a lethal phenotype (Dev Biol, 2015, PMID: 25446029). As to the control involving Myh10MO in the experiment in Fig.5E, we did include it in our experiments. As we did not observe any obvious effects on either motility or sarcomere structures, we did not include the data set in the figure.

      2) It was suggested that Myh9 and Myh10 form a complex (Rao et al. PLoS One 9, e114087, 2014). Thus, the IP experiments do not rule out the possibility that Styxl2 directly interacts with either Myh9 or Myh10 and indirectly with the other.

      Our response: In known myosin-II complexes, different myosin molecules can associate with each other through their tail domains (Bioarchitecture, 2013, PMID: 24002531). Thus, if we use fulllength myosin molecules in our co-immunoprecipitation assays, it will be difficult to exclude the possibility raised by the reviewer. However, by using truncated myosin proteins, we showed that the head domain of either Myh9 or Myh10 could interact with Styxl2 in the absence of the tail domain (Figure 4E, F). This result strongly suggests that both Myh9 and Myh10 can independently interact with Styxl2.

      Reviewer #3 (Recommendations For The Authors):

      1) The western blot shown in Figure 3B supporting the induced deletion of Styxl2 should be quantified. Ideally, some other blots, e.g., in Figure 5, too. Please add the age of the mice in Figure 5B to the figure legend.

      Our response:<br /> As suggested by the reviewer, we quantified the data in Figures.3B, 3F, 5B, 5D, and 7A and the data were included in the revised figures. In Fig.5B, we already indicated the age of the mice (i.e., P1) in the legend.

      2) A quantification of the sarcomere phenotypes in the double knock-down of zMyh10 and Styxl2 compared to Styxl2 single would make the paper significantly stronger. Furthermore, a double morpholino control should be included to rule out any RNAi machinery 'dilution effect'.

      Our response: As suggested by the reviewer, we quantified the sarcomere structures using the line scan analysis in ImageJ and the scan images were placed as inserts in the upper corner of the immunofluorescent images (revised Figures 5F, and 6C). To avoid potential “dilution effects”, in all the experiments involving the use of two different MOs, the total amount of MO was kept the same in all control samples by including a control MO (e.g., in samples treated with one specific MO, an equal amount of a control MO was also included, while in samples without any specific MO, twice as much control MO was used).

      3) The sarcomere phenotypes in figure 6 should also be better quantified, for example using simple line scans of the alpha-actinin stains and assay periodicity or calculating the autocorrelation coefficients. How about myosin stains?

      Our response: We quantified Figure 6C as suggested by the reviewer. We also performed myosin staining. The results were similar to that shown by the a-actinin antibody (see revised Figure 6-Fig supplement 1B).

      4) Do the authors see periodic NMs patterns in developing mouse muscle fibers as indicated by the model in in in figure 7D? It is unclear if nonmuscle myosin is present in a PERIODIC pattern in early myofibrils. NM myosin periodic patterns that have been observed have a periodicity of only about 1 µm fitting the shorter length of the NM bipolar filaments (about 300 nm only, PMID 28114270).

      Our response: The reviewer raised a good point here. Ideally, we should examine developing mouse muscle fibers to prove that NM shows periodic patterns. However, due to the difficulty in catching myocytes undergoing sarcomere assembly, the majority of the studies involving NM in sarcomeres use cultured cardiomyocytes. Using TA muscles from P1 new-born mice, we failed to detect the presence of NM in sarcomeres (see Author response image 3 below). Actually, nearly all the myofibers showed mature sarcomere pattern without the NM signal. More work is needed in the future to examine developing mouse fibers at different embryonic stages to look for the presence of NM in developing sarcomeres.

      Author response image 3.

      The TA muscles were collected from male and female P1 mice. The muscles were sectioned and co-stained for a-actinin (Actn) and Myh9. The majority of myofibrils is mature without the NM II signal. Scale bar, 10 µm.

      5) Recent work suggested that mechanical tension is key to assemble the first long periodic myofibril containing immature sarcomeres. Tension is likely produced by a combination of NM and Mhc in the assembling sarcomeres themselves. This could be included in the introduction or discussion (PMIDs 24631244, 29316444, 29702642, 35920628).

      Our response: We thank the reviewer for pointing to us additional relevant references. We have added them in the Introduction.

      6) I suggest replacing "sarcomeric muscles" with "striated muscles".

      Our response: We revised the term in the manuscript as suggested by the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study addresses how protein synthesis in activated lymphocytes keeps up with their rapid division, with important findings that are of significance to cell biologists and immunologists endeavouring to understand the 'economy' of the immune system. The work is supported by solid data but because it proposes non-conventional mechanisms, it requires additional explanation and justification to align with the current understanding in the field.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors examine the fascinating question of how T lymphocytes regulate proteome expression during the dramatic cell state change that accompanies the transition from the resting quiescent state to the activated, dividing state. Orthogonal, complementary assays for translation (RPM/RTA, metabolic labeling) are combined with polyribosome profiling and quantitative, biochemical determinations of protein and ribosome content to explore this question, primarily in the OT-I T lymphocyte model system. The authors conclude that the ratio of protein levels to ribosomes/protein synthesis capacity is insufficient to support activation-coupled T cell division and cell size expansion. The authors hint at cellular mechanisms to explain this apparent paradox, focusing on protein acquisition strategies, including emperipolesis and entosis, though these remain topic areas for future study.

      The strengths of the paper include the focus on a fundamental biological question - the transcriptional/translational control mechanisms that support the rapid, dramatic cell state change that accompanies lymphocyte activation from the quiescent to activated state, the use of orthogonal approaches to validate the primary findings, and the creative proposal for how this state change is achieved.

      The weakness of the work is that several cellular regulatory processes that could explain the apparent paradox are not explored, though they are accessible for experimental analysis. In the accounting narrative that the authors highlight, a thorough accounting of the cellular process inventory that could support the cell state change should be further explored before committing to the proposal, provocative as it is, that protein acquisition provides a principal mechanism for supporting lymphocyte activation cell state change.

      Appraisal and Discussion:

      1) relating to the points raised above, two recent review articles explore this topic area and highlight important areas of study in RNA biology and translational control that likely contribute to the paradox noted by the authors: Choi et al. 2022, doi.org/10.4110/in.2022.22.e39 ("RNA metabolism in T lymphocytes") and Turner 2023, DOI: 10.1002/bies.202200236 ("Regulation and function of poised mRNAs in lymphocytes"). These should be cited, and the broader areas of RNA biology discussed by these authors integrated into the current manuscript.

      Good suggestion. We have added these references with a short discussion.

      2) The authors cite the Wolf et al. study from the Geiger lab (doi.org/10.1038/s41590-020-07145, ref. 41) though largely to compare determined values for ribosome number. Many other elements of the Wolf paper seem quite relevant, for example, the very high abundance of glycolytic enzymes (and whose mRNAs are quite abundant as well), where (and as others have reported) there is a dramatic activation of glycolytic flux upon T cell activation that is largely independent of transcription and translation, the evidence for "pre-existing, idle ribosomes", the changes in mRNA copy number and protein synthesis rate Spearman correlation that accompanies activation, and that the efficiencies of mRNA translation are heterogeneous. These data suggest that more accounting needs to be done to establish that there is a paradox.

      As one example, what if glycolytic enzyme protein levels in the resting cell are in substantial excess of what's needed to support glycolysis (likely true) and so translational upregulation can be directed to other mRNAs whose products are necessary for function of the activated cell? In this scenario, the dilution of glycolytic enzyme concentration that would come with cell division would not necessarily have a functional consequence. And the idle ribosomes could be recruited to key subsets of mRNAs (transcriptionally or post-transcriptionally upregulated) and with that a substantial remodeling of the proteome (authors ref. 44). The study of Ricciardi et al. 2018 (The translational machinery of human CD4+ T cells is poised for activation and controls the switch from quiescence to metabolic remodeling (doi.org/10.1016/j.cmet.2018.08.009) is consistent with this possibility. That study, and the short reviews noted above, are useful in highlighting the contributions of selective translational remodeling and the signaling pathways that contribute to the cell state change of T cell activation.

      Our study focuses on the central issue of whether measured ribosome translation rates support rapid division. The abundance of glycolytic enzymes, mRNA copy numbers etc., are clearly interesting and critical to cell metabolism, but are irrelevant to measuring the overall translation rate and capacity of T cells.

      From this perspective, an alternative view can be posited, where the quiescent state is biologically poised to support activation, where subsets of proteins and mRNAs are present in far higher levels than that necessary to support basal function of the quiescent lymphocyte. In such a model, the early stages of lymphocyte activation and cell division are supported by this surplus inventory, with transcriptional activation, including ribosomal genes, primarily contributing at later stages of the activation process. An obvious analogy is the developing Drosophila embryo where maternal inheritance supports early-stage development and zygotic transcriptional contributions subsequently assuming primary control (e.g. DOI 10.1002/1873-3468.13183 , DOI: 10.1126/science.abq4835). To pursue that biological logic would require quantifying individual mRNAs and their ribosome loading states, mRNA-specific elongation rates, existing individual protein levels, turnover rates of both mRNAs and proteins, ribosome levels, mean ribosome occupancy state, and how each of these parameters is altered in response to activation. Such accounting could go far to unveil the paradox. This is a considerable undertaking, though, and outside the scope of the current paper.

      The reviewer is essentially proposing RiboSeq analysis of pre- and post-activation T cells, whereby individual mRNAs can be queried for ribosome occupancy, and where translation inhibitors could be used to quantify mRNA-specific transit rates. This is important information but would not provide a more accurate accounting of protein synthesis rates than our much more direct measurement. We note that other labs have begun to work on this exact topic, however – see both PMID: 36002234 and PMID: 32330465.

      Reviewer #2 (Public Review):

      This paper takes a novel look at the protein economy of primary human and mouse T-cells - in both resting and activated state. Their findings in primary human T-cells are that:

      1) A large fraction of ribosomes are stalled in resting cultured primary human lymphocytes, and these stalled ribosomes are likely to be monosomes.

      2) Elongation occurs at similar rates for HeLa cells and lymphocytes, with the active ribosomes in resting lymphocytes translating at a similar rate as fully activated lymphocytes.

      They then turn their attention to mouse OT-1 lymphocytes, looking at translation rates both in vitro and in vivo. Day 1 resting T-cells also show stalling - which curiously wasn't seen on freshly purified cells - I didn't understand these differences.

      This is clarified and discussed starting in the third paragraph of “Protein synthesis in mouse lymphocytes ex vivo” section. Cells cultured ex vivo for 1 day with no activation show signs of stalling, as we observed in isolated human cells. But cells immediately out of an animal show a measurable decay rate since they are obviously synthesizing proteins in vivo and are processed rapidly.

      In vivo, they show that it is possible to monitor accurate translation and measure rates. Perhaps most interestingly they note a paradoxically high ratio of cellular protein to ribosomes insufficient to support their rapid in vivo division, suggesting that the activated lymphocyte proteome in vivo may be generated in an unusual manner.

      This was an interesting and provocative paper. Lots of interesting techniques and throwing down challenges to the community - it manages to address a number of important issues without necessarily providing answers.

      Reviewer #3 (Public Review):

      This manuscript provides a more or less quantitative analysis of protein synthesis in lymphocytes. I have no issue with the data as presented, as I'm sure all measurements have been expertly done. I see no need for additional experimental work, although it would be helpful if the authors could comment on the possibility of measuring the rate of synthesis of a defined protein, say a histone, in cells prior to and after activation. The conclusion the authors leave us with is the idea that the rates of protein synthesis recorded here are incompatible with observed rates of T cell division in vivo. Indeed, in the final paragraph of the discussion, the authors note the mismatch between what they consider a requirement for cell division, and the observed rates of protein synthesis. They then invoke unconventional mechanisms to make up for the shortfall, without -in this reviewer's opinion- discussing in adequate detail the technical limitations of the methodology used.

      Points #1-3 in the Discussion relate to potential pitfalls of our analyses; in point #3 we now add further limitations of RTA based on non-random detection of nascent chains due either to bias in either puromycylation or antibody detection of puromycylated nascent chains.

      A key question is the broad interest, novelty, and extension of current knowledge, in comparison with Argüello's (reference 27) 'SunRise' method. It would be helpful for the authors to stake out a clear position as to the similarities and differences with reference 27: what have we learned that is new? The authors could cite reference 27 in the introduction of their manuscript, given the similarity in approach. That said, the findings reported here will generate further discussion.

      We did cite this reference (27) in the section “Flow RPM measures ribosome elongation rate in live cells” giving credit where credit is due. We independently devised the method in 2014, and uniquely, to our knowledge, have applied it in vivo. We now further discuss the importance of our CHX modification to limit dissociation and increase the accuracy of RTA (second and third paragraphs of “Protein synthesis in mouse lymphocytes and innate immune cells in vivo”).

      The manuscript would increase in impact if the authors were to clearly define why a particular measurement is important and then show the actual experiment/result. As an example, it would be helpful to explain to the non-expert why the distinction between monosomes, polysomes, and stalled versions of the same is important, and then explain the rationale of the actual experiment: how can these distinctions be made with confidence, and what are confounding variables?

      We believe this is addressed in the section “Resting human lymphocytes have a dominant monosome population”.

      The initial use of human cells, later abandoned in favor of the OT-1 in vitro and in vivo models, requires contextualization. If the goal is to address the relationship between rates of translation and cell division of antigen-activated T cells in vivo, then a lot of the work on the human model and the in vitro experiments becomes more of a distraction, unless properly contextualized. Is there any reason to assume that antigen-specific activation in vivo will impact translation differently than the use of the PMA/ionomycin/IL2 cocktail? The way the work is presented leaves me with the impression that everything that was done is included, regardless of whether it goes to the core of the question(s) of interest.

      Donor PBMCs are clearly the more relevant model for understanding human T cell biology, which is why started our studies with this model. Had the manuscript strictly described mouse studies it is likely that we would be criticized for not studying human cells: Catch 22! However, as we state in the manuscript, the human cell model has a variety of technical downsides, including donor heterogeneity. PMA/ionomycin activation is also physiologically questionable, and while we could deliver a defined TCR to redirect their specificity, this is typically done after cells have been activated, since lentiviral delivery is poor in resting lymphocytes. A main point we try to make from this work is that cells derived from human blood donors show signs of ribosomal stalling by the time they are isolated and put into culture. This may limit the usefulness of studying them preactivation, although based on our mouse data, some level of stalled ribosomes may be a feature as well – to prime T cells to be ready for their massive expansion. The move to the OT-I system gave us complete control over the system, including in vivo delivery of translation inhibitors.

      It would be helpful if the authors made explicit some of the assumptions that underlie their quantitative comparisons. Likewise, the authors should discuss the limitations of their methods and provide alternative interpretations where possible, even if they consider them less/not plausible, with justification. As they themselves note, improvements in the RPM protocols raised the increase in translating ribosomes upon activation from 10-fold to 15-fold. Who's to say that is the best achievable result? What about the reliability/optimization of the other measurements?

      We expanded discussion of potential pitfalls of the RPM techniques and others in the Discussion section. Regarding RPM per se, we use it as a readout of ribosome time decay, so even if further optimizations can be made, the decay rates we have made should still be accurate. In addition, for our cell accounting measurements in Figure 6, we do not use RPM data and rather calculate based on the assumption that every ribosome is used for protein synthesis at a “maximal” rate of mRNA transit.

      The composition of the set of proteins produced upon activation will differ from cell to cell (CD4, CD8, B, resting vs. dividing). Even if analyses are performed on fixed cells, the ability of the monoclonal anti-puromycin antibody to penetrate the matrix of the various fixed cell types may not be equal for all of them, depending on protein composition, susceptibility to fixation etc. Is it possible for puromycin to occupy the ribosome's A site and terminate translation without forming a covalent bond with the nascent chain? This could affect the staining with anti-puromycin antibodies and also underestimate the number of nascent chains.

      Yes, the method (like every other one) is imperfect. Harringtonine run-off experiments show that RPM staining only detects nascent chains. Note that reference 47 reports that 75% of translation in activated T cells is devoted to synthesizing ~250 housekeeping proteins, which are likely to be highly similar between lymphocyte subsets.

      I believe that the concept of FACS-based quantitation also requires an explanation for the nonexpert. For the FACS plots shown, the differences between the highest and lowest RPM scores for cells that divided and that have a similar CFSE score is at least 10-fold. Does that mean that divided cells can differ by that margin in terms of the number of nascent chains present? If I make the assumption that cells stimulated with PMA/ionomycin/IL2 respond more or less synchronously, why would there be a 10-fold difference in absolute fluorescence intensity (anti=puromycin) for randomly chosen cells with similar CFSE values? While the use of MFI values is standard practice in cytofluorimetry, the authors should devote some comments to such variation at the population level.

      We believe that the referee is referring to Sup Fig. 1B. In this experiment the T cells are polyclonal and represent the full range of naïve to potentially exhausted differentiation states. Looking at our initial in vivo RPM study (reference 22) and comparing Figure 2 (OTI’s) to Figure 3 (endogenous CD4s or CD8s), reveals more spread in the RPM values polyclonal vs. monoclonal T cells - now clarified in the third paragraph of “Protein synthesis in mouse lymphocytes and innate immune cells in vivo”). Flow cytometry is by far the most accurate method for measuring fluorescence in individual cells. It is likely to be an accurate measure of the variation of nascent chains in cells in the same division cohort but likely represents the diversity of T cell activation profiles in blood of healthy donors.

      It is assumed that for cells to complete division, they must have produced a full and complete copy of their proteome and only then divide. What if cells can proceed to divide even when expressing a subset of the proteome of departure (=the threshold set required for initiation of division), only to complete synthesis of the 'missing ' portion once cell division is complete? Would this obviate the requirement for an unusual mechanism of protein acquisition (trogocytosis; other)?

      There must be a steady state level of translation and proteome replenishment, though. If a cell can divide when it affords daughter cells with 90% of its G0 proteome (as an example), that daughter cell would either 1) be 10% smaller, or 2) require extra translation to make up for the missing proteome during its own division cycle. Though T cells do typically shrink slightly after an initial activation, cell size stabilizes over time. Requiring each daughter cell to make more and more missing proteome could be plausible, considering that initial bursts of division do take longer over time, but still, even in vitro activated T cells divide rapidly for weeks without large decreases in their division rates.

      Translation is estimated to proceed at a rate of ~6 amino acids per second, but surely there is variability in this number attributable to inaccuracies of the methods used, in addition to biological variability. Were these so-called standard values determined for a range of different tissues? It stands to reason that there might be variation depending on the availability of initiation/elongation factors, NTPs, aminoacyl tRNAs etc. What is the margin of error in calculating chain elongation rates based on the results shown here?

      We refer to all relevant studies we know of, including new in vivo estimates of elongation rates (reference 40).

      Reviewer #1 (Recommendations For The Authors):

      A "limitations of study" section would be a helpful way to detail potential contributing mechanisms that were not explored in the current study.

      We have expanded the methodological limitations in the Discussion section.

      Major:

      1) Broaden the scope of biological models that could explain the paradox.

      In the Discussion, we suggest that T cells acquire some fraction of their proteome through external sources and highlight some examples of this occurring.

      Minor:

      1) Include Mr markers for Fig. 2C.

      Done.

      2) Though commonly used interchangeably, historically the term protein synthesis was the consequence of mRNA translation. In other words, proteins are not translated.

      Good point! We have changed the text accordingly.

      3) Include more meaningful X-axis legend in polysome gradient panels i.e., Fig. S2, e.g., fraction number.

      In most experiments, fractions were not collected. Rather, the x-axis refers to time that the sample took to be queried by the detector.

      4) Figure 3A does not report polysome profiles as described in the text, pg. 5, though this is reported in Fig S2D.

      The figure callouts were correct but confusing. We now separately refer to out each result to clarify.

      5) In Fig 5A, SDS-PAGE/anti-Puro blots would be more convincing and contain more information. The dot-blot is difficult to interpret.

      Disagree. To quantitate total anti-puromycin signal a dot blot is far better than immunoblotting, which is compromised by unequal transfer of different protein species.

      6) It's not clear why a degree of monosome translation is necessarily surprising (pg. 7).

      It’s surprising since for many decades it was believed that translation by monosomes is a tiny fraction of translation. But separately, with this particular mode of activation, activated T cells displayed a preponderance of monosomes during their burst of division. When the activation method was improved, polysomes dominated. But monosome translation clearly supported T cell division during activation without cognate peptide, which was interesting.

      Reviewer #2 (Recommendations For The Authors):

      1) One concern is the dose of puromycin used. My understanding is that puromycin acts as a chain termination inhibitor - but is being used here predominantly as a label for nascent polypeptide chains. My concern, therefore, is the dose being used - here at 50ug/ml - which seems high and I would be concerned that at this dose it would act as a translational inhibitor rather than just labelling nascent chains, and is therefore resulting in a lower signal/background ration than expected. In human cell lines 0.1ug/ml is optimal and doses published (in cell lines) range between 1 and 10ug/ml so it will be interesting to understand why this high dose was used.

      Do they have a dose-response curve - is this high dose necessary because these are primary Tcells. Can the authors show that 50 µg/mL of puromycin is optimal for studying protein translation in primary human T cells? A titration curve will help answer this question and could be included in Suppl Figure 1. This experiment is critical as the authors use a higher dose than previous studies (commonly between 1 and 10 µg/mL).

      The reviewer is referencing puromycin concentrations typically used in the selection of cells – for the RPM assay, puromycin is used at saturating doses to label the maximal number of nascent chains stalled by CHX or EME pretreatment.

      2) None of the figures show statistical significance.

      Statistics on relevant comparisons are now indicated on figures and in legends.

      3) The authors mention: "We performed RPM on cells labelled with CFSE to track cell division by dye dilution (Supplemental Figure 1B). On day 2, activated cells exhibited multiple populations, with nearly all divided cells showing a high RPM signal.". However, on day 2 it is hard to see any dividing cells in the dot plot included in the supplemental figure. Dividing cells only appear on day 5? Their statements make the subsequent paragraphs also difficult to follow.

      We modified the text to clarify this data – there is likely activation-induced cell death occurring which is why there are relatively few CFSE-low cells at this timepoint, and they do exhibit a fairly wide range of RPM staining. The main point is that by day 5, nearly all divided cells exhibit high RPM.

      4) "Many divided cells exhibited near baseline RPM signals, however, consistent with their return to the resting state. Interestingly, although non-activated cells did not divide, ~50% demonstrated increased RPM staining.". Again, it is hard to see the ~50% of cells with increased RPM the authors refer to in the provided supplemental figure.

      This is from quantification of the flow data and is described more fully later when we discuss ribosome stalling.

      5) The authors say "Thus, we cannot attribute the persistence of flow RPM staining in translation initiation inhibitor-treated cells to incomplete inhibition of protein synthesis.' - but it's unclear what this refers to as in the previous paragraph they also say: 'Initiation inhibitors, however, clearly discriminated between day 1 resting and activated cells. RPM signal was diminished by up to 8090% on day 5 post-activation.' - this is all somewhat confusing. It would be helpful to have this clarified and in the text to make more liberal use of referring to specific figures.

      Figure 1B shows that RPM is maintained at fairly high levels during treatment with EME or CHX (in contrast to the initiation inhibitors HAR/PA). To rule out that the drugs were simply not active, tritiated leucine labeling was conducted to confirm that incorporation of the radiolabeled amino acid dropped to near-baseline (Figure 1C). Therefore, we can conclude that the drugs are indeed working as intended, but EME/CHX does not decrease RPM signal to the same extent that they prevent leucine incorporation.

      6) Page 5 Fig 3A - I don't understand the difference between freshly isolated OT-1 cells - which don't stall and day 1 OT-1 cells which do. Why are freshly isolated cells not behaving like the naïve cells- isn't this what they would predict? Also - I accept that there is a move from monosome to polysome population between day 1 and 2 - the effect isn't huge - it would be helpful/interesting to know what has happened by day 5 - is the effect much more significant?

      Freshly isolated cells are harvested from animals and immediately queried, whereas day 1 cells are cultured for 24h in the absence of any activation. Presumably, the ex vivo culture without any activation causes the mouse T cell ribosomes to stall, just as we observed in cells obtained from human donors that took hours to collect and bring to the bench. The appearance of polysomes is really related to how the activation of the cells is done… refer to Figure 5B to see how significant the polysome buildup can be!

      7) Fig S3C - I don't understand how they reach the conclusion from this figure that: '~15-fold increase in translating ribosomes in activated OT-I T cells in vivo (Supplemental Figure 3C) as compared to the 10-fold increase we previously reported using the original protocol. It would very much help the reader if these calculations could be better explained.

      These are simply quantifications of the RPM staining done in Supplemental Figure 3C compared to experiments done in the absence of the CHX-modified method.

      8) Page 7 - They conclude that the Tan paper has superior lymphocyte activation - but presumably this depends on the signal as to whether there is more activation and how this affects the shift from monosome to polysome -ie maybe a stronger activation signal affects the distribution more - perhaps their method is the more physiological? Is their conclusion fair - that 'These findings indicate that monosomes make a major contribution to translation in resting T cells but are likely to make a minor contribution in fully activated cells.'

      Yes, we believe that their published method would be more physiological with the use of the natural OT-I peptide. We conclude that although monosome translation is present (as others have published), there are relatively few monosomes in fully activated T cells. Therefore, the monosome contribution to overall translation in activated T cells appears to be minor.

      9) Contrary to observations in vitro, ribosomes are not stalled in naïve mouse T cells in vivo, as we show via RTA analysis of non-activated T cells. - yes - this seems somewhat surprising - what is the explanation?

      We presume this is due to the stress/non-native environment that ex vivo cultured cells are subjected to.

      10) Whilst I understand the point that the authors are trying to make in Figure 1D about resting T cells having high background RPM staining due to stalled ribosomes, it is intriguing that there is almost no difference (no statistical significance provided) after 2 or 5 days of activation. Isn't this finding contrary to the one provided in Figure 1A and Suppl Figure 1B?

      Figure 1A is showing the difference between no activation and activation conditions. Figure 1D is predominantly meant to show that the increase in RPM from activated cells at day 1 and day 5 are not as different as one might predict. The reason, as we describe in further experiments, is likely that cells exhibiting ribosomal stalling can incorporate puromycin, damping the “fold change” we calculate (unlike what we observe in metabolic labeling experiments in the same figure panel). Statistics have now been displayed on the graphs in Figure 1D for further clarification.

      11) "Including EME with HAR prevented decay of the RPM signal, as predicted, since EME blocks elongation while enabling (even enhancing) puromycylation21,26." I find this very confusing. I understand that emetine blocks protein elongation whilst enabling puromycilation, but why does it block the effect of the protein initiation inhibitor Harringtonin? Do they compete with each other?

      When ribosomes are frozen with emetine, they cannot transit mRNA and “fall off”. Therefore, the inclusion of EME in these experiments is a control to ensure that we are looking at true transit and runoff of ribosomes with harringtonine treatment (explanation in the second paragraph of “Flow RPM measures ribosome elongation rates in live cells” section)

      12) Can the authors explain why the RPM signal of activated OT-I cells (PMA/Iono) increases 20fold compared to resting cells, but there is only a ~2-fold increase in signal in human cells? The authors previously mentioned: "We noted that the RPM signal in activated cells was only 2- to 5fold higher than in non-activated cells. This increase is modest compared to the ~15-fold activation-induced increase in protein synthesis in original studies 10,11. To examine this discrepancy, we incubated cells for 15 min with harringtonin (HAR) or pactamycin (PA) to block translation initiation or emetine (EME) or cycloheximide (CHX) to block elongation." Would the authors have followed the same path if they had started the paper with OT-I cells?

      Human cells are not as well activated as OT-I in our study. The last question is beyond the scope of our reasoning as empirical evidence-based scientists, but we have applied for funding from the HG Wells Foundation for a time machine to answer this question.

      13) Authors should include representative raw data of the flow cytometries used to perform the "Ribosome Transit Assay (RTA) in Figures 2 and 3 as supplemental data.

      Done; now included in Supplemental Figures 1 and 3.

      14) It would be interesting to compare RPM in T cells activated with a more physiological stimulus, such as beads anti-CD3 anti-CD28 vs PMA/Iono. Particularly after showing that peptide-specific stimulation (with SIINFEKL) is more effective than PMA/Iono in activating OT-I cells and inducing polysome formation (Figures 5B and Suppl Figure 4A).

      We tried plate bound anti- CD3 and anti-CD28 early in these studies, and they didn’t induce as much early activation.

      15) Can the authors include the gating strategy to call "activated OT-I cells" to the cells shown in Suppl Figure 3c?

      A new Supplemental Figure 3D has been added showing the exact gating strategy for the OT-I cell RTA assays described in Supplemental Figure 3C and elsewhere.

      16) In Figure 6B, the authors mention an increase in the volume of the cells based on the assumption of spherical morphology but then show an increase in diameter. It would be more consistent to show both parameters in the same graph.

      The graph was changed to volume calculations instead of diameter for clarity. But they are linked as volume scales by radius cubed.

      17) The paper's main conclusion (i.e., that the ratio of proteins to ribosomes in T cells activated in-vivo does not support their doubling time) is exciting. They conclude this after measuring cell volume, protein abundance, and ribosomes per cell. As no changes in cell volume and protein abundance between T cells activated in vitro vs in vivo were observed (Figures 6B and 6C), the difference is exclusively attributable to a reduced number of ribosomes per cell in T cells activated in vivo (Figure 6F). Critically, the measurement of ribosomes per cell in T cells activated in vivo (Figure 6F, "ex vivo day 2") includes only two data points. It is hard to understand how the authors calculated this figure's means and standard deviations as it is not described in the figure legend. From the dispersion observed for "day 1" and "day 2" in vitroactivated T cells, it seems that the variability of the assay to measure ribosome content could explain part of the phenotype. Additionally, there are several missing data points in Figure 6H. As this figure is just a transformation of Figures 6D and 6G, it isn't easy to understand why. Can I suggest that they include more data points for Figures 6F, G, and H in the ex vivo day 2' category as the two data points shown with little variability is out of keeping with the rest of the data, and may be skewing their data?

      Figure 6F does not have the same number of data points as other panels because it required measurement of both protein content and ribosome number. Since the ribosome quantification method described here was developed later than some of our earlier protein measurements, not all experiments had both sets of data to properly calculate the proteins per ribosome. All data that had both values are included, though.

      Reviewer #3 (Recommendations For The Authors):

      Minor points:

      If an increase in cell diameter is recorded upon activation, why not also provide the value for the increase in volume?

      Done

      Regarding the writing, the erratic punctuation/hyphenation - or lack thereof - doesn't improve readability. One example: "....consistent with the idea that the flow RPM signal in day 1 resting lymphocytes...." Perhaps better: "... consistent with the idea that the RPM signal, obtained by flow cytometry for lymphocytes analyzed on day 1 and maintained in the absence of any activating agent,..." I understand that this can make for longer sentences, but I object to the use of 'flow' as shorthand for 'flow cytometry', and to the use of day 1 as an adverb or adjective. That works as lab jargon, it's less effective in a written text. The abbreviation 'DRiPs' is not defined. Words like 'notably', and 'surprisingly' can be eliminated.

      This work would benefit from the inclusion of a section describing 'Limitations of the study'.

      This is now expanded in the Discussion, as described above.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The association of vitamin D supplementation in reducing Asthma risk is well studied, although the mechanistic basis for this remains unanswered. In the presented study, Kilic and co-authors aim to dissect the pathway of Vitamin D-mediated amelioration of allergic airway inflammation. They use initial leads from bioinformatic approaches, which they then associate with results from a clinical trial (VDAART) and then validate them using experimental approaches in murine models. The authors identify a role of VDR in inducing the expression of the key regulator Ikzf3, which possibly suppresses the IL-2/STAT5 axis, consequently blunting the Th2 response and mitigating allergic airway inflammation.

      The major strength of the paper lies in its interdisciplinary approach, right from hypothesis generation, and linkage with clinical data, as well as in the use of extensive ex vivo experiments and in vivo approaches using knock-out mice. The study presents some interesting findings including an inducible baseline absence/minimal expression of VDR in lymphocytes, which could have physiological implications and needs to be explored in future studies. However, the study presents a potential for further dissection of relevant pathophysiological parameters using additional techniques, to explain certain seemingly associative results, and allow for a more effective translation.

      Several results in the study suggest multiple factors and pathways influencing the phenotype seen, which remain unexplored. The inferences of this study also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant. While this does not undermine the importance of this elegant study, it is essential to emphasize a holistic picture while interpreting the results.

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to advance our knowledge of how vitamin D may be protective in allergic airway disease in both adult and neonatal mouse models. The rationale and starting point are important human clinical, genetic/bioinformatic data, with a proposed role for vitamin D regulation of 2 human chromosomal loci (Chr17q12-21.1 and Chr17q21.2) linked to the risk of immune-mediated/inflammatory disease. The authors have made significant contributions to this work specifically in airway disease/asthma. They link these data to propose a role for vitamin D in regulating IL-2 in Th2 cells implicating genes associated with these loci in this process.

      Strengths:

      Here the authors draw together evidence form. multiple lines of investigation to propose that amongst murine CD4+ T cell populations, Th2 cells express high levels of VDR, and that vitamin D regulates many of the genes on the chromosomal loci identified to be of interest, in these cells. The bottom line is the proposal that vitamin D, via Ikfz3/Aiolos, suppresses IL-2 signalling and reduces IL-2 signalling in Th2 cells. This is a novel concept and whilst the availability of IL-2 and the control of IL-2 signalling is generally thought to play a role in the capacity of vitamin D to modulate both effector and especially regulatory T cell populations, this study provides new data.

      Weaknesses:

      Overall, this is a highly complicated paper with numerous strands of investigation, methodologies etc. It is not "easy" reading to follow the logic between each series of experiments and also frequently fine detail of many of the experimental systems used (too numerous to list), which will likely frustrate immunologists interested in this. There is already extensive scientific literature on many aspects of the work presented, much of which is not acknowledged and largely ignored. For example, reports on the effects of vitamin D on Th2 cells are highly contradictory, especially in vitro, even though most studies agree that in vivo effects are largely protective. Similarly other reports on adult and neonatal models of vitamin D and modulation of allergic airway disease are not referenced. In summary, the data presentation is unwieldy, with numerous supplementary additions, that makes the data difficult to evaluate and the central message lost. Whilst there are novel data of interest to the vitamin D and wider community, this manuscript would benefit from editing to make it much more readily accessible to the reader.

      Wider impact: Strategies to target the IL-2 pathway have long been considered and there is a wealth of knowledge here in autoimmune disease, transplantation, GvHD etc - with some great messages pertinent to the current study. This includes the use of IL-2, including low dose IL-2 to boost Treg but not effector T cell populations, to engineered molecules to target IL-2/IL-2R.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In the revised manuscript, the authors have addressed a significant number of concerns raised. The restructuring and incorporation of a number of discussion points have improved the readability. Moreover, the authors have also incorporated some more figures to address certain questions raised.

      However, the authors could reconsider a few more points which would improve the readability of the manuscript.

      For e.g.

      1) While it is appreciated that the authors have provided the schematic of the study design for the VDAART trial, the visualization for the RNA-seq analysis may be helpful.

      We have created a visualization of the workflow for the RNA seq analysis as part of Figure 1 – figure supplement 1C.

      2) Quantification of images would not require any additional experiments, yet can reinforce the results with objectivity.

      We appreciate this comment. We chose to display histology images to allow a glimpse at the inflammatory condition in the lung tissue. For histological quantification, lung tissue should have been harvested and analyzed in a systematic and randomized way as well as in sufficient animal numbers to allow statistical analyses. This has not been done for these mouse models since the focus was in analyzing cytokine production by lung tissue CD4+ T cells as the driver of inflammation.

      3) The authors have not addressed the discrepancy of the sample sizes in the experiments. Some dot plots still don't match the legends, and there is a wide variation in the numbers chosen for different experiments and different groups in the same experiments.

      We appreciate the thorough screening of our manuscript and apologize for this oversight. We corrected the errors in the respective figure legends.

      The in vivo experiments comprise studies performed in (A) VDR-KO mice and (B) WT mice fed with vit-D supplemented chow.

      Sample size calculations for the mouse models of allergic airway inflammation based on BAL cell numbers revealed a minimum of n=8 per group for correct statistical analysis. In both experimental settings, the respective mouse lines were bred in the mouse facilities of MGH (A) and BWH (B). Depending on the litter sizes, additional mice were added in the HDM group, since bigger variability was expected in this group than the saline group.

      Intracellular CD4+ cytokine staining was performed for all mice, however some stainings failed and could not be reliably interpreted and were therefore excluded.

      Reviewer #2 (Recommendations For The Authors):

      The authors have largely replied to the reviewer comments, amended some noted typos & figure legend issues, as well as discussed the reviewers concerns in text and in their rebuttal.

      The data presented are novel and of significant interest, conceptually moving this field forward, but in this reviewer's opinion reflect one pathway, of likely several, linked to protective effects of vitamin D on airway disease. This reviewer recommends a further slight editing of the text to present this broader scenario.

      i) Treg cells are highly dependent on IL-2 (both Foxp3+ and IL-10+ cells, not always the same population), constitutively express the IL-2R, and there is already a significant literature regarding vitamin D and IL-10/Treg in control of immune-mediated conditions. A simple statement acknowledging this and that there are likely more than one mechanisms by which vitamin D may regulate allergic airway disease (directly or indirectly) would be appreciated - this is no way detracts from the novelty and contribution of the current findings.

      We thank the reviewer for this suggestion. We have added the following statement to the manuscript (lines 623-625):

      “Additional pathways, including the induction of IL-10 production by CD4+ T cells as well as a direct induction of Foxp3+ T reg cells could have further contributed to the observed protective effect of vitamin D supplementation (PMID: 21047796; 22529297).”

      ii) More comprehensive referencing of earlier papers proposing effects of vitamin D in controlling Treg/IL-10 and dampening Th2 responses in mouse (and human) models

      (e.g. Taher, Y. A., van Esch, B. C. A. M., Hofman, G. A., Henricks, P. A. J. & van Oosterhout, A. J. M. 1alpha,25-dihydroxyvitamin D3 potentiates the beneficial effects of allergen immunotherapy in a mouse model of allergic asthma: role for IL-10 and TGF-beta. J. Immunol. 180, 5211-21 (2008). Vassiliou JE et al, 2014. Vitamin D deficiency induces Th2 skewing and eosinophilia in neonatal allergic airways disease. Allergy DOI10.1111/all.12465).

      We have included the reference in the discussion section of our manuscript in lines 617-619:

      “Similar findings regarding the effects of vitamin D in controlling Treg/IL-10 and dampening Th2 responses have been reported, e.g., in (PMID: 18390702) and in offspring of mice that had been subjected to vitamin D deficiency in the third trimester of their pregnancy (PMID: 24943330).”

    2. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The association of vitamin D supplementation in reducing Asthma risk is well studied, although the mechanistic basis for this remains unanswered. In the presented study, Kilic and co-authors aim to dissect the pathway of Vitamin D mediated amelioration of allergic airway inflammation. They use initial leads from bioinformatic approaches, which they then associate with results from a clinical trial (VDAART) and then validate them using experimental approaches in murine models. The authors identify a role of VDR in inducing the expression of the key regulator Ikzf3, which possibly suppresses the IL-2/STAT5 axis, consequently blunting the Th2 response and mitigating allergic airway inflammation.

      Strengths:

      The major strength of the paper lies in its interdisciplinary approach, right from hypothesis generation, and linkage with clinical data, as well as in the use of extensive ex vivo experiments and in vivo approaches using knock-out mice.

      The study presents some interesting findings including an inducible baseline absence/minimal expression of VDR in lymphocytes, which could have physiological implications and needs to be explored in future studies.

      Weaknesses:

      The core message of the study relies on the role of vitamin D and its receptor in suppressing the Th2 response. However, there is scope for further dissection of relevant pathophysiological parameters in the in vivo experiments, which would enable stronger translation to allergic airway diseases like Asthma.

      To a large extent, the authors have been successful in validating their results, although a few inferences could be reinforced with additional techniques, or emphasised in the discussion section (possibly utilising the ideas and speculative section offered by the journal).

      The study inferences also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant. Moreover, the authors have referenced vitamin D doses for the murine models from the VDAART trials and performed the experiments in the second generation of animals. While this is appreciated, the risk of hypervitaminosis-D cannot be ignored, in view of its lipid solubility. Possibly comparison and justification of the doses used in murine experiments from previous literature, as well as the incorporation of an emphasised discussion about the side effects and toxicity of Vitamin D, is an important aspect to consider.

      In no way do the above considerations undermine the importance of this elegant study which justifies trials for vitamin D supplementation and its effects on Asthma. The work possesses tremendous potential.

      We thank the reviewer for their careful assessment of our paper and helpful suggestions. Please find the point-by-point responses to the reviewer recommendations below.

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to advance our knowledge of how vitamin D may be protective in allergic airway disease in both adult and neonatal mouse models. The rationale and starting point are important human clinical, genetic/bioinformatic data, with a proposed role for vitamin D regulation of 2 human chromosomal loci (Chr17q12-21.1 and Chr17q21.2) linked to the risk of immune-mediated/inflammatory disease. The authors have made significant contributions to this work specifically in airway disease/asthma. They link these data to propose a role for vitamin D in regulating IL-2 in Th2 cells implicating genes associated with these loci in this process.

      Strengths:

      Here the authors draw together evidence from multiple lines of investigation to propose that amongst murine CD4+ T cell populations, Th2 cells express high levels of VDR, and that vitamin D regulates many of the genes on the chromosomal loci identified to be of interest, in these cells. The bottom line is the proposal that vitamin D, via Ikfz3/Aiolos, suppresses IL-2 signalling and reduces IL-2 signalling in Th2 cells. This is a novel concept and whilst the availability of IL-2 and the control of IL-2 signalling is generally thought to play a role in the capacity of vitamin D to modulate both effector and especially regulatory T cell populations, this study provides new data.

      Weaknesses:

      Overall, this is a highly complicated paper with numerous strands of investigation, methodologies etc. It is not "easy" reading to follow the logic between each series of experiments and also frequently fine detail of many of the experimental systems used (too numerous to list), which will likely frustrate immunologists interested in this. There is already extensive scientific literature on many aspects of the work presented, much of which is not acknowledged and largely ignored. For example, reports on the effects of vitamin D on Th2 cells are highly contradictory, especially in vitro, even though most studies agree that in vivo effects are largely protective. Similarly, other reports on adult and neonatal models of vitamin D and modulation of allergic airway disease are not referenced. In summary, the data presentation is unwieldy, with numerous supplementary additions, which makes the data difficult to evaluate and the central message lost. Whilst there are novel data of interest to the vitamin D and wider community, this manuscript would benefit from editing to make it much more readily accessible to the reader.

      Wider impact: Strategies to target the IL-2 pathway have long been considered and there is a wealth of knowledge here in autoimmune disease, transplantation, GvHD etc - with some great messages pertinent to the current study. This includes the use of IL-2, including low dose IL-2 to boost Treg but not effector T cell populations, to engineered molecules to target IL-2/IL-2R.

      We thank the reviewer for their careful assessment of our paper and helpful suggestions. Please find the point-by-point responses to the reviewer recommendations below. In addition, we have revisited the Introduction and Discussion, added additional subsection headings, and provided additional schematics to make the general flow of the paper more accessible to a wider audience.

      Reviewer #1 (Recommendations For The Authors):

      There are certain aspects of the manuscript which could be revisited in order to provide more clarity to the reader. Some of these are:

      1. In vivo experiments : The major inference and its impact is derived from the effect of VDR on Ikzf3 expression, and consequently on the Th2 response. While the study employs both in vivo and ex vivo approaches to validate this claim, pathophysiological aspects could have been explored in more detail, by using cytokine panels, possibly techniques to measure airway resistance, as well as by reducing the variations in the sample sizes used in different groups. Similarly, certain inferences from ex vivo studies may be important to demonstrate in the in vivo setting as well. A justification for the incorporation of both Balb/c and C57 Bl6 mice for the experiments could also be incorporated in the manuscript.

      2. Certain sections, especially those connecting VDR, Ikzf1/3 and IL2/STAT axis seem associative. This is indicated by Figure 5 H as well, where the effects of calcitriol administration in KO cells indicate additional pathways at play, possibly through indirect effects. The use of additional techniques like ChIP, co-IP and establishing STAT induction/activation would probably strengthen the findings, alternatively, a clear distinction between the speculative and the definitive results could be made in the discussion section, as the journal encourages. Similar considerations could be made for VDR and Ikzf3.

      3. Role of other cells :

      a. While the investigators have explored the phenotype on other cell types like Th1 and Treg, at places there remains a lacuna. For instance, the absence of neutrophil fractions from the DLC-BAL, as well inconsistencies in the groups selected for comparison. For eg. in Figure 3 Supplementary Figure 2, the figure suggests IL13 expression in CD4+ cells, yet the text reads incubated Th2 cells. This could be made more lucid.

      b. In Figure 3 Supplementary Figure 1 there is a trend towards an increase in IL-10 levels, whereas in Supplementary Figure 2 there is a drop in the IL13 level in the VDR KO group, which has not been explained.

      c. While 17q loci form the predominant loci associated with Asthma, other loci important in Asthma on chromosomes 2,6,9, 22 could be discussed in the manuscript as well, even if they can't be explored in depth.

      1. Quantification of histology and confocal images could provide an objective assessment to the readers. Possibly incorporation of co-localisation panels for the IF images showing membrane/cytoplasmic/ nuclear localisation of the VDR under various conditions.

      2. Structure of the manuscript: At places the manuscript has a disrupted flow, as well as mislabelled figures (Figure 2SF1B is 1C, Fig 2c is 2b in the results, ). Flow gates can be arranged sequentially and consistent labelling of the gates and axis would ease interpretation. In some places sample sizes mentioned do not match the dot graphs in the figures (figure 3K-L). In the same figure and others (Figure 5 Supplementary Figure 2), a comparison of all groups would be beneficial. A restructuring of the results and corrections, could assist the reader. Also, a visualization of the VDAART analysis in the main figures, corroborating with the results sections would do justice to the interesting approach and findings. The clearances and approvals for the study also need to be incorporated into the manuscript. If possible, the incorporation of a schematic showing the proposed pathway for VDR-induced Ikzf3 and subsequent suppression of the genes present on Chr 17 loci to mitigate allergic airway inflammation would help.

      Reviewer #2 (Recommendations For The Authors):

      A few specific points: A number of immune concepts are studied without reference to the broader literature and the data presented data on occasion counter these earlier findings. Examples of this include:

      • Vitamin D can both enhance and inhibit IL-13 synthesis, demonstrated both in vitro and ex vivo, and these effects are clearly context-specific. I am not questioning the validity of the present experimental findings in this specific experimental model), but the experimental context - the problem is that this is not discussed.

      • Short-term bulk Th2 cultures are used with no indication of their enrichment for lineage-specific markers or cytokine - their conclusions might be enhanced by this. Data on genes/markers of interest could be further enhanced by showing FACS plots of co-expression e.g. Th2 genes e.g. IL-13/GATA3 with these other markers.

      • Are human Th2 enriched for VDR, since the backdrop to this study is human clinical and genetic data? For a study that has based its rationale on human clinical/genetic studies it would be great to confirm these findings in human Th2 cells.

      • The Discussion might comment on some of these wider issues.

      • Minor typos throughout, including in figure legends

      Reviewer #1

      1. The study inferences also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant.

      We agree that asthma has many sub-phenotypes and endotypes and that the Th2 response may not be predominant in all of them, but we focus here on the origins of the disease in the first few years of life and the genetic and molecular mechanisms associate with disease onset where the Th2 response is important.

      1. Moreover, the authors have referenced vitamin D doses for the murine models from the VDAART trials and performed the experiments in the second generation of animals. While this is appreciated, the risk of hypervitaminosis-D cannot be ignored, in view of its lipid solubility. Possibly comparison and justification of the doses used in murine experiments from previous literature, as well as the incorporation of an emphasized discussion about the side effects and toxicity of Vitamin D, is an important aspect to consider.

      We appreciate this comment from the reviewers allowing us to review vitamin D toxicity in more detail. Given the length of this review we did not include this in the manuscript discussion but provide it here.

      Vitamin D supplementation in humans is debated due to possibility of intoxication from overdose. Vitamin D intoxication is a rare medical condition associated with hypercalcemia, hyperphosphatemia, and suppressed parathyroid hormone level and is typically seen in patients who are receiving very high doses of vitamin D, ranging from 50,000 to 1 million IU/d for several months to years 1,2. Intoxication observed at lower doses might be attributable to rare genetic disorders 1. By far the bigger problem in humans is vitamin D deficiency; this is especially true in pregnant women where dosage requirements are high due to the needs of the fetus. It is estimated that virtually all pregnant women are vitamin D insufficient or deficient 3. VDAART has shown that vitamin D in a dose of 4400 IC given to pregnant women can prevent asthma in their offspring. There were no adverse side effects in the mother or the infant from this dose 4.

      In rodents, a few studies have reported vitamin D intoxication with very high vitamin D doses 5(PMID: 23405058: 50.000 IU/kg 120d -> toxicity in females). In contrast there are several studies using 2-2.5 times higher doses of vitamin D than we use here, that do not report adverse events in mouse models of disease 6,7. Our doses of vitamin D are identical to those used in VDAART and are lower than those used in any of these other rodent studies. In addition, while we did not specifically assess specific signs of vitamin D intoxication, we can exclude any impact on animal well-being, health, reproduction, and behavior throughout the study.

      1. The major inference and its impact are derived from the effect of VDR on Ikzf3 expression, and consequently on the Th2 response. While the study employs both in vivo and ex vivo approaches to validate this claim, pathophysiological aspects could have been explored in more detail, by using cytokine panels, possibly techniques to measure airway resistance, as well as by reducing the variations in the sample sizes used in different groups.

      We have added the following sentence to the discussion: “Additional cytokine measurements in the mice as well as measurement of airway resistance would have added to the pathophysiological data linking IKFZ3 expression to TH2 response.”

      1. Similarly, certain inferences from ex vivo studies may be important to demonstrate in the in vivo setting as well. A justification for the incorporation of both Balb/c and C57 Bl6 mice for the experiments could also be incorporated in the manuscript.

      We agree with the reviewers that ex vivo results may require in vivo confirmation. We have added a sentence explaining the rationale for use of both Balb/c and C57BL/6 mice in the results section “Vitamin D suppresses the activation of the IL-2/Stat5 pathway and cytokine production in Th2 cells”: “To ensure that the above findings were not restricted to the C57BL/6 mouse strain, the inverse experiment was performed in Balb/c mice. This mouse strain is commonly used for type 2 driven inflammation.”

      1. Certain sections, especially those connecting VDR, Ikzf1/3 and IL2/STAT axis seem associative. This is indicated by Figure 5 H as well, where the effects of calcitriol administration in KO cells indicate additional pathways at play, possibly through indirect effects.

      We appreciate this comment. The RNA-Seq results showed an over representation of the IL-2/STAT5 pathway in Vit-D deficient Th2 cells compared to those under Vitamin D supplementation. We further show the induction of IKZF3 expression with calcitriol stimulation. High IKZF3 expression is known to suppress IL-2 expression. Lack of IKZF3 diminishes the suppressive activity of calcitriol on IL-2 expression. However, as pointed out by the reviewer, Figure 5 H implicates additional pathways regulated by calcitriol for the suppression of IL-2 and we note that in the text.

      1. The use of additional techniques like ChIP, co-IP and establishing STAT induction/activation would probably strengthen the findings, alternatively, a clear distinction between the speculative and the definitive results could be made in the discussion section, as the journal encourages. Similar considerations could be made for VDR and Ikzf3.

      We have added the following sentence to the discussion. We have focused here on establishing the relationship between VDR binding and IKFZ3 activation or repression and subsequent ORMDL3 and Il2 activation. Additional use of ChIP or co-IP to establish STAT induction and activation would have been of potential value.

      1. Role of other cells: a. While the investigators have explored the phenotype on other cell types like Th1 and Treg, at places there remains a lacuna. For instance, the absence of neutrophil fractions from the DLC BAL, as well inconsistencies in the groups selected for comparison. For e.g., in Figure 3 Supplementary Figure 2, the figure suggests IL13 expression in CD4+ cells, yet the text reads incubated Th2 cells. This could be made more lucid.

      We appreciate this comment and would like to clarify. Neutrophil numbers were assessed in the presented in vivo models and showed no differences in neutrophil number due to genotype or vitamin D diet. We added the graphs to the supplement in Figure 3 - figure supplement 1A and Figure 5 - figure supplement 1B and refer to the figures in the main text. All in vivo data were analyzed by Mixed-effect ANOVA analysis or Two-way ANOVA test with Holm-Šidák’s post-hoc analysis (factors: genotype & exposure). To keep the plots clear, we incorporated only the statistic for the groups of interest.

      1. b) In Figure 3 Supplementary Figure 1 there is a trend towards an increase in IL-10 levels, whereas in Supplementary Figure 2 there is a drop in the IL13 level in the VDR KO group, which has not been explained.

      We apologize for any confusion. Figure 3 supplementary Figure 1 shows cytokine positive CD4+ T cells isolated from saline and HDM exposed mouse lungs. These data were analyzed with a Mixed-effect ANOVA analysis or Two-way ANOVA test with Holm-Šidák’s post-hoc analysis (factors: genotype & exposure) and were not found significant. Figure 3 supplementary Figure 2 shows IL-13 levels in the system of in vitro polarization of naïve CD4+ T cells into Th2 cells. The difference between this result and the findings in Figure 3H is the in vivo setting in which additional factors such as IL-4 can aggravate the immune response.

      1. c) While 17q loci form the predominant loci associated with Asthma, other loci important in Asthma on chromosomes 2,6,9, 22 could be discussed in the manuscript as well, even if they can't be explored in depth.

      This is an excellent comment. Our preliminary results confirm that three asthma susceptibility loci: 2q12.1 (IL1RL1), 6p21.32 (HLA-DQA1/B1/A2/B2) and 22q12.3 (IL2RB) each have VDR and IKZF3 binding sites either in enhancers predicted by GeneHancer to target these genes or within these genes themselves. In particular, we found (i) VDR binding sites within IL18RAP and in the enhancer region GH02J102301 targeting IL1RL1, and IKZF3 binding sites within IL1RL1; (ii) VDR binding sites in the enhancer regions GH06J032940 and GH06J031813 targeting HLA-DQA2, and IKZF3 binding sites within HLA-DQA1; (iii) VDR and IKZF3 binding sites within IL2RB. In contrast, the region 9p24.1 (IL33) has no documented VDR or IKZF3 binding sites within IL33 or in the promoter regions targeting IL33. Investigating these additional genetic loci further, using the integrative approach taken here with 17q12-21, is beyond the scope of this current manuscript but based on these preliminary results, would be a worthwhile scientific endeavor.

      1. Quantification of histology and confocal images could provide an objective assessment to the readers. Possibly incorporation of co-localisation panels for the IF images showing membrane/cytoplasmic/nuclear localisation of the VDR under various conditions.

      We agree that quantification of histology and confocal images could provide an overview of VDR expression in the lungs. Given the knowledge on VDR expression in a variety of cell types, including structural cells in the lungs and the focus of this manuscript on CD4+ T cells, we focused on determining VDR expression in CD4+ T cells isolated from saline and HDM exposed lungs in the mouse models studied (Figure 2 C; Fig. 2- figure supplement 1 B & C, Figure 3 C; Figure 5 - figure supplement 1) as well as in vitro (Figure 2 - figure supplement 2; Figure 5 - figure supplement 2).

      1. Structure of the manuscript: At places the manuscript has a disrupted flow, as well as mislabeled figures (Figure 2SF1B is 1C, Fig 2c is 2b in the results, ). Flow gates can be arranged sequentially and consistent labelling of the gates and axis would ease interpretation.

      We appreciate this comment and have corrected the mislabeled figures and tried to improve the flow.

      1. In some places sample sizes mentioned do not match the dot graphs in the figures (figure 3K-L). In the same figure and others (Figure 5 Supplementary Figure 2), a comparison of all groups would be beneficial.

      We appreciate this comment and have checked the sample sizes. Each of these experiments compared two groups and these two groups were compared statistically. We corrected the sample size for Figure 5 Supplementary Figure 2 C in the manuscript.

      1. A restructuring of the results and corrections, could assist the reader.

      We have restructured both the results and the discussion, incorporating the changes noted here in the response to the reviewers, to make the flow of the manuscript easier to read.

      1. Also, a visualization of the VDAART analysis in the main figures, corroborating with the results sections would do justice to the interesting approach and findings.

      We have now added the below schematic to Figure 1-figure supplement 1C to summarize the analyses conducted on the VDAART data.

      Author response image 1.

      1. The clearances and approvals for the study also need to be incorporated into the manuscript.

      These were in the checklist and have been moved to the main text of the manuscript.

      1. If possible, the incorporation of a schematic showing the proposed pathway for VDR induced Ikzf3 and subsequent suppression of the genes present on Chr 17 loci to mitigate allergic airway inflammation would help.

      We have a figure for this (below) that we have incorporated into the manuscript as Figure 5 - figure supplement 3:

      Author response image 2.

      Cartoon Summarizing Vitamin D molecular genetics at 17q12-21

      Reviewer #2

      1. A few specific points: A number of immune concepts are studied without reference to the broader literature and the data presented data on occasion counter these earlier findings. Examples of this include:

      a. Vitamin D can both enhance and inhibit IL-13 synthesis, demonstrated both in vitro and ex vivo, and these effects are clearly context-specific. I am not questioning the validity of the present experimental findings in this specific experimental model), but the experimental context - the problem is that this is not discussed.

      We thank the reviewer for this comment. We have now included a sentence in the discussion section mentioning the contradictory results. It reads as follows:

      “We acknowledge that the impact of vitamin D on Th2 biology is conflicting in the literature. While several groups report Th2 promoting activity, we, and others, show inhibition of type 2 cytokine production 8–11. These discrepancies could be due to the model system studied, e.g., PBMC and purified CD4+ T cells, or the dose of vitamin D or the mouse strain.”

      b. Short-term bulk Th2 cultures are used with no indication of their enrichment for lineage specific markers or cytokine – their conclusions might be enhanced by this. Data on genes/markers of interest could be further enhanced by showing FACS plots of co-expression e.g., Th2 genes e.g., IL-13/GATA3 with these other markers.

      We appreciate this comment. The in vitro culture system used for Th2 cell differentiation has been well described in the literature. As shown in Figure 3 - figure supplement 2; Figure 4 E and Figure 5 - figure supplement 2 D & E the lineage specific IL-13 cytokine levels are detectable at high levels.

      c. Are human Th2 cells enriched for VDR, since the backdrop to this study is human clinical and genetic data? For a study that has based its rationale on human clinical/genetic studies it would be great to confirm these findings in human Th2 cells.

      We appreciate this comment and are curious to explore this in future research. The VDAART trial is a double-blinded multicenter trial in which an immediate processing of the blood samples and an enrichment of different immune cell populations was not feasible. Other publicly available data sets report gene expression derived from mixed and peripheral (blood) cells and not local (lung) tissues. Published in vitro studies on human Th2 cells do not report VDR expression in comparison to other Th subsets, which would allow the assessment of enrichment.

      1. The Discussion might comment on some of these wider issues.

      We have rewritten the discussion to incorporate many of the issues raised in this review.

      1. Minor typos throughout, including in figure legends.

      We have edited all of the figure legends.

      References

      1. Holick, M. F. Vitamin D Is Not as Toxic as Was Once Thought: A Historical and an Up-to-Date Perspective. Mayo Clinic proceedings 90, 561–564; 10.1016/j.mayocp.2015.03.015 (2015).

      2. Hossein-nezhad, A. & Holick, M. F. Vitamin D for health: a global perspective. Mayo Clinic proceedings 88, 720–755; 10.1016/j.mayocp.2013.05.011 (2013).

      3. Hollis, B. W. & Wagner, C. L. New insights into the vitamin D requirements during pregnancy. Bone research 5, 17030; 10.1038/boneres.2017.30 (2017).

      4. Litonjua, A. A. et al. Effect of Prenatal Supplementation With Vitamin D on Asthma or Recurrent Wheezing in Offspring by Age 3 Years: The VDAART Randomized Clinical Trial. JAMA 315, 362–370; 10.1001/jama.2015.18589 (2016).

      5. Gianforcaro, A., Solomon, J. A. & Hamadeh, M. J. Vitamin D(3) at 50x AI attenuates the decline in paw grip endurance, but not disease outcomes, in the G93A mouse model of ALS, and is toxic in females. PloS one 8, e30243; 10.1371/journal.pone.0030243 (2013).

      6. Landel, V., Millet, P., Baranger, K., Loriod, B. & Féron, F. Vitamin D interacts with Esr1 and Igf1 to regulate molecular pathways relevant to Alzheimer's disease. Molecular neurodegeneration 11, 22; 10.1186/s13024-016-0087-2 (2016).

      7. Agrawal, T., Gupta, G. K. & Agrawal, D. K. Vitamin D supplementation reduces airway hyperresponsiveness and allergic airway inflammation in a murine model. Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology 43, 672–683; 10.1111/cea.12102 (2013).

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The authors of the manuscript "High-resolution kinetics of herbivore-induced plant volatile transfer reveal tightly clocked responses in neighboring plants" assessed the effects of herbivory induced maize volatiles on receiver plants over a period of time in order to assess the dynamics of the responses of receiver plants. Different volatile compound classes were measured over a period of time using PTR-ToF-MS and GC-MS, under both natural light:dark conditions, and continuous light. They also measured gene expression of related genes as well as defense related phytohormones. The effects of a secondary exposure to GLVs on primed receiver plants was also measured.

      The paper addresses some interesting points, however some questions arise regarding some of the methods employed. Firstly, I am wondering why VOCs (as measured by GC-MS) were not quantified. While I understand that quantification is time consuming and requires more work, it allows for comparisons to be made between lines of the same species, as well as across other literature on the subject. Simply relying on the area under the curve and presenting results using arbitrary units is not enough for analyses like these. AU values do not allow for conclusions regarding total quantities, and while I understand that this is not the main focus of this paper, it raises a lot of uncertainty for readers (for example, the references cited show that TMTT has been found to accumulate at similar levels of caryophyllene, however the AU values reported are an order of magnitude higher for TMTT. Again, without actual quantification this is meaningless, but for readers it is confusing).

      With regards to the correlation analyses shown in figure 6, the results presented in many of the correlation plots are not actually informative. While there is a trend, I do not think that this is an appropriate way to show the data, as there are clearly other relationships at play. The comparison between plants under continuous light and normal light:dark conditions is interesting.

      This paper addresses a very interesting idea and I look forward to seeing further work that builds on these ideas.

      As mentioned in our previous response, we have added the quantification of GLVs in order to increase the comparability of our work to other studies.

      Regarding the comment about TMTT (only measured as internal pools), the purpose of the inclusion of these internal pool data, was simply to determine whether terpenes were accumulating in leaf tissue during the night when emissions are hindered (likely due to closed stomata). The data clearly show that internal terpene pools do not accumulate above daytime levels during darkness – this is further supported by gene expression data that show downregulation of terpene synthase genes during darkness. While quantification would certainly increase the ability to compare internal pools, it would not change the interpretation of our results. Also note that absolute quantification is challenging for compounds such as TMTT, which are not readily available.

      Regarding the comment on Figure 6, while we agree there may be interesting patterns beyond linear relationships, as stated in our previous response, the purpose of our analysis was to determine if the higher terpene burst in receiver plants on the second day may be explained by sender plants emitting more GLVs on the second day. Figure 6 shows that this is not the case. Further analyses would not provide additional significant insights into the hypothesis that we tested here.

      We thank the reviewer for their overall positive outlook on our paper and for the constructive comments.

      Reviewer #2 (Public Review):

      The exact dynamics of responses to volatiles from herbivore-attacked neighbouring plants have been little studied so far. Also, we still lack evidence whether herbivore-induced plant volatiles (HIPVs) induce or prime plant defences of neighbours. The authors investigated the volatile emission patterns of receiver plants that respond to the volatile emission of neighbouring sender plants which are fed upon by herbivorous caterpillars. They applied a very elegant approach (more rigorous than the current state-of-the-art) to monitor temporal response patterns of neighbouring plants to HIPVs by measuring volatile emissions of senders and receivers, senders only and receivers only. Different terpenoids were produced within 2 h of such exposure in receiver plants, but not during the dark phase. Once the light turned on again, large amounts of terpenoids were released from the receiver plants. This may indicate a delayed terpene burst, but terpenoids may also be induced by the sudden change in light. As one contrasting control, the authors also studied the time-delay in volatile emission when plants were just kept under continuous light. Here they also found a delayed terpenoid production, but this seemed to be lower compared to the plants exposed to the day-night-cycle. Another helpful control was now performed for the revision in which the herbivory treatment was started in the evening hours and lights were left on. This experiment revealed that the burst of terpenoid emission indeed shifted somewhat. Circadiane and diurnal processes must thus interact.

      Interestingly, internal terpene pools of one of the leaves tested here remained more comparable between night and day, indicating that their pools stay higher in plants exposed to HIPVs. In contrast, terpene synthases were only induced during the light-phase, not in the dark-phase. Moreover, jasmonates were only significantly induced 22 h after onset of the volatile exposure and thus parallel with the burst of terpene release.

      An additional experiment exposing plants to the green leaf volatile (glv) (Z)-3-hexenyl acetate revealed that plants can be primed by this glv, leading to a stronger terpene burst. The results are discussed with nice logic and considering potential ecological consequences. All data are now well discussed.

      Overall, this study provides intriguing insights in the potential interplay between priming and induction, which may co-occur, enhancing (indirect and direct) plant defence. Follow-up studies are suggested that may provide additional evidence.

      We thank the reviewer for their positive outlook on our paper and for their constructive comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The authors did a great job with the revision. The additional experiments strengthened their conclusions. Thanks also for performing the suggested test for potential differences in induction capacity at different times of day, the new data are very interesting.

      Thank you very much.

      Line 49-52: The newly added sentence could be clarified in wording.

      We will clarify the sentence.

      Line 254-255: The newly added sentence needs to be corrected. This is no full sentence and it is not clear what the authors wanted to say here.

      We will clarify this sentence.

      Figure 6: In those instances, in which the correlation is not significant, the line should not be shown.

      We will remove the lines when correlations are not significant.

      The names of chemical compounds and terpene synthases should be written in lower case letters (see legend Fig 6, e.g. hexenal, not Hexenal; legend fig. 2: terpene synthase, not Terpene synthase)

      In the last round of revisions, I commented on Line 23: consequences on community dynamics are not investigated here, so this is a bit misleading. ... Your response was "We have deleted the sentence about community dynamics ..." which, however, in fact was not done! Please change!

      Apologies for that, we will delete mention of community dynamics in that sentence (for real).


      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study examines the effects of herbivory-induced maize volatiles on neighboring plants and their responses over time. Measurements of volatile compound classes and gene expression in receiver plants exposed to these volatiles led to the conclusion that the delayed emission of certain terpenes in receiver plants after the onset of light may be a result of stress memory, highlighting the role of priming and induction in plant defenses triggered by herbivore-induced plant volatiles (HIPVs). Most experimental data are compelling but additional experiments and accurate quantifications of the compounds would be required to confirm some of the main claims.

      Response: We thank the editors for their overall positive feedback on our MS. We have added additional experiments to quantify green leaf volatile emissions in both sender plants and synthetic dispensers (Reviewer 1) and address the importance of the precise time of day plants are induced (Reviewer 2). These additions strengthen the main conclusions of our study.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors of the manuscript "High-resolution kinetics of herbivore-induced plant volatile transfer reveal tightly clocked responses in neighboring plants" assessed the effects of herbivory-induced maize volatiles on receiver plants over a period of time in order to assess the dynamics of the responses of receiver plants. Different volatile compound classes were measured over a period of time using PTR-ToF-MS and GC-MS, under both natural light:dark conditions, and continuous light. They also measured gene expression of related genes as well as defence-related phytohormones. The effects of a secondary exposure to GLVs on primed receiver plants were also measured.

      The paper addresses some interesting points, however, some questions arise regarding some of the methods employed. Firstly, I am wondering why VOCs (as measured by GC-MS) were not quantified. While I understand that quantification is time-consuming and requires more work, it allows for comparisons to be made between lines of the same species, as well as across other literature on the subject. As experiments with VOC dispensers were also used in this experiment, I find it even more baffling that the authors didn't confirm the concentration of the emission from the plants they used to make sure they matched. The references cited justifying the concentration used (saying it was within the range of GLVs emitted by their plants) to prepare the dispenser were for either a different variety of maize (delprim versus B73) or arabidopsis. Simply relying on the area under the curve and presenting results using arbitrary units is not enough for analyses like these.

      Response: We thank the reviewer for their comment. We have now quantified both the emission of dispensers and maize seedlings infested with 3 4th-instar Spodoptera exigua larvae. Averaged across 1 h, HAC dispensers emitted roughly 2x higher molar concentrations than total GLV molar concentrations emitted by plants infested by 3 caterpillars. Of note, GLV emissions induced by caterpillars vary over time, and can be more than 2-fold higher than the average during times of strong active feeding (Supplemental Fig 4). Thus, the release rate of the dispensers is well within the plant’s physiological range.

      Note that the references cited were included to support the claim of the biological activity of all three GLVs rather than to justify concentration of our dispensers. We have rephrased this sentence to reflect this (see L330-333).

      With regards to the correlation analyses shown in Figure 6, the results presented in many of the correlation plots are not actually informative. By blindly reporting the correlation coefficient important trends are being ignored, as there are clearly either bimodal relationships (e.g. upper left panel, HAC/TMTT, HAC/MNT) or even stranger relationships (e.g. upper left panel, IND/SQT, IND/MNT) that are not being well explained by a correlation plot. It is not appropriate to discuss the correlation factors presented here and to draw such strong conclusions on emission kinetics. The comparison between plants under continuous light and normal light:dark conditions is interesting, but I think there are better ways to examine these relationships, for example, multivariate analysis might reveal some patterns.

      Response: We thank the reviewer for their comment. With our analysis we aimed at testing specifically whether the high release of known bioactive volatiles (GLVs and indole) by sender plants on the second day can explain the higher terpene emissions in the receiver plants. We explicitly mention this in the text (L176-L186). Indeed, under normal light conditions (light and dark phase), there are clear positive correlations between the GLV release of sender plants and the terpene release of receiver plants over time (see also Fig 1 and Fig 5). However, under continuous light conditions, GLV emissions in sender plants no longer correlate with terpene emissions in receiver plants (also apparent by comparison of Fig 4 and Fig 5). This shows that temporal variation in GLV emissions are insufficient to explain the delayed terpene burst. This is the relevant conclusion we draw from this analysis. As presented, we find the data to provide strong evidence that the delayed burst in receiver plant terpene emissions cannot be solely explained by higher availability of active signals on the second day. The priming experiment in Figure 7 then provides a direct additional test for this concept. While more complex analyses could indeed reveal additional patterns, these would not be particularly informative for the question at hand.

      In Figure 2, the elevated concentrations of beta-caryophyllene found in the control plants at 8h and 16.75h measurement timepoints are curious. Is this something that is commonly seen in B73?

      Response: We thank the reviewer for this comment. A small number of untreated plants indeed accumulated β -caryophyllene at night, which is likely the result of biological variability between samples. Our plants were soil-grown, and it is for instance possible that variation in soil biota may account for this variability. Alternatively, some plants may have been slightly stressed during handling. Note that this variability does not affect any of the conclusions in our manuscript.

      While there can be discrepancies between emissions and compounds actually present within leaf tissue, it is a little bit odd that such high levels of b-caryophyllene were found at these timepoints, however, this is not reflected in the PTR-ToF-MS measurements of sesquiterpenes. It would be beneficial to include an overview of the mechanism of production and storage of sesquiterpenes in maize leaves, which would clarify why high amounts were found only in the GC-MS analysis and not the PTR-ToF-MS analysis, which is a more sensitive analytical tool. It is possible that the amounts of b-caryophyllene present in the leaf are actually extremely low, however as the values are not given as a concentration but rather arbitrary units, it is not possible to tell. I would include a line explaining what is seen with b-caryophyllene.

      Response: Thank you for this comment. It is important to note that accumulation in maize leaves can differ substantially from emission, especially at night when stomata are closed. This has been observed before in maize leaves (Seidl-Adams et al., 2015). As the reviewer suspects, earlier work indeed found that β-caryophyllene is a minor sesquiterpene compared to β-farnesene and α-bergamotene in B73 ( Block et al., 2018). The PTR-ToF-MS does not discriminate between terpenes with the same m/z and thus measures total sesquiterpene emissions. Given that sesquiterpene emissions are strongly regulated by stomatal aperture and that overall sesquiterpene accumulation in control plants is low, it is not surprising that we measure only minor amounts of sesquiterpene emissions in general, and in control plants in particular. We now text to the manuscript to explain these aspects (L116-L122). Block, A.K., Hunter, C.T., Rering, C. et al. Contrasting insect attraction and herbivore-induced plant volatile production in maize. Planta 248, 105–116 (2018).

      Seidl-Adams I, Richter A, Boomer KB, Yoshinaga N, Degenhardt J, Tumlinson JH. Emission of herbivore elicitor-induced sesquiterpenes is regulated by stomatal aperture in maize (Zea mays) seedlings. Plant Cell Environ. 38, 23-34 (2015).

      Additionally, it seems like the amounts of TMTT within the leaf are extraordinarily high (judging only by the au values given for scale), far higher than one would expect from maize.

      Response: We are unsure about the reviewer’s interpretation here. The AU values do not allow for conclusions regarding total quantities. An earlier study found that TMTT in induced B73 plants accumulates to similar amounts as β-caryophyllene (Block et al., 2018), thus it is not surprising to detect significant TMTT pools in induced maize leaves. It is important to note that the aim of the experiment here was to test the hypothesis that plants may be hyperaccumulating volatiles when the stomata are closed at night, which could potentially explain the delayed terpene burst on the second day. We do not observe such a hyperaccumulation, thus ruling out this as the primary factor responsible for the observed phenomenon. This is further supported by the continuous light experiments, where the delayed burst in terpene emission is not hindered by the lack of a dark phase.

      Block, A.K., Hunter, C.T., Rering, C. et al. Contrasting insect attraction and herbivore-induced plant volatile production in maize. Planta 248, 105–116 (2018).

      Reviewer #2 (Public Review):

      The exact dynamics of responses to volatiles from herbivore-attacked neighbouring plants have been little studied so far. Also, we still lack evidence of whether herbivore-induced plant volatiles (HIPVs) induce or prime plant defences of neighbours. The authors investigated the volatile emission patterns of receiver plants that respond to the volatile emission of neighbouring sender plants which are fed upon by herbivorous caterpillars. They applied a very elegant approach (more rigorous than the current state-of-the-art) to monitor temporal response patterns of neighbouring plants to HIPVs by measuring volatile emissions of senders and receivers, senders only and receivers only. Different terpenoids were produced within 2 h of such exposure in receiver plants, but not during the dark phase. Once the light turned on again, large amounts of terpenoids were released from the receiver plants. This may indicate a delayed terpene burst, but terpenoids may also be induced by the sudden change in light. A potential caveat exists with respect to the exact timing and the day-night cycle. The timing may be critical, i.e. at which time-point after onset of light herbivores were placed on the plants and how long the terpene emission lasted before the light was turned off. If the rhythm or a potential internal clock matters, then this information should also be highly relevant. Moreover, light on/off is a rather arbitrary treatment that is practical for experiments in the laboratory but which is not a very realistic setting. Particularly with regard to terpene emission, the sudden turning on of light instead of a smooth and continuous change to lighter conditions may trigger emission responses that are not found in nature.

      Response: We thank the reviewer for their comment. Although not explicitly mentioned it in the initial draft of the MS, we employed 15 min transition periods for light and dark phase transitions with a light intensity of 60 µmol m-2 s-1 (compared to 300 µmol m-2 s-1 at full light) to achieve a more gradual transition. We now included this information in the manuscript (L291-L292).

      As one contrasting control, the authors also studied the time-delay in volatile emission when plants were just kept under continuous light (just for the experiment or continuously?). Here they also found a delayed terpenoid production, but this seemed to be lower compared to the plants exposed to the day-night-cycle. Another helpful control would be to start the herbivory treatment in the evening hours and leave the light on. If then again plants only release volatiles after a 17 h delay, the response is indeed independent of the diurnal clock of the plant.

      Response: This is a very interesting point raised by the reviewer. We now conducted an additional experiment under continuous light where we started the herbivory treatment just before the start of the dark phase (ca. 20:00 PM). We found a similar pattern: a distinct delay in the highest burst. However, interestingly, the burst was shifted from 12-18 hr to 10-12 hr (Supplemental Fig 1). This burst aligned reasonably well with the point at which lights would normally be turned on again. In light of this, and, as the herbivore additions typically started ca. 5 hrs after the onset of light following a dark period (Figures 1-7), we wanted to rule out the possibility that the lack of a burst on the first day, was simply due to a difference in induction capacity depending on how shortly after the onset of light plants became exposed to GLVs. As such, we designed an additional experiment to examine whether exposure to GLVs immediately after the lights come on induce higher terpene emissions than plants exposed to GLVs ca. 5 hr after lights come on (Supplemental Fig 2). Interestingly, emissions across the terpenes were similar, regardless how long after the onset of lights on plants were exposed to GLVs. This suggests that the delayed burst is not due to the fact that, on the second day, plants are exposed to GLVs immediately after the lights come on whereas the first day they are only exposed 5 hr after the lights come on. Both continuous light experiments (normal timing and shifted timing) show bursts that occur slightly earlier than we observe with under normal day : night light conditions (L159-L166 and L207-L211), suggesting an interaction between circadian and diurnal processes. For instance, it is possible that plants would start producing volatiles slightly earlier than the onset of the day, however, light and stomatal opening limits the exact timing of the burst under normal light:dark transitions. The additional data provide further evidence for the delayed burst as a timed response in maize plants.

      Additionally, we have added explanation the continuous light figure legends that plants were grown under normal conditions and lights were only left on following treatment.

      Interestingly, internal terpene pools of one of the leaves tested here remained more comparable between night and day, indicating that their pools stay higher in plants exposed to HIPVs. In contrast, terpene synthases were only induced during the light-phase, not in the dark-phase. Moreover, jasmonates were only significantly induced 22 h after the onset of the volatile exposure and thus parallel with the burst of terpene release. An additional experiment exposing plants to the green leaf volatile (glv) (Z)-3-hexenyl acetate revealed that plants can be primed by this glv, leading to a stronger terpene burst. The results are discussed with nice logic and considering potential ecological consequences. Some data are not discussed, e.g. the jasmonate and gene induction pattern.

      Response: Thanks for this comment. We have added a sentence regarding the jasmonate data suggesting that, in addition to providing an additional layer of evidence for the observed delay, suggest that other JA-dependent defenses in maize may follow similar temporal patterns (L254-L257).

      Overall, this study provides intriguing insights into the potential interplay between priming and induction, which may co-occur, enhancing (indirect and direct) plant defence. Follow-up studies are suggested that may provide additional evidence.

      Reviewer #1 (Recommendations For The Authors):

      Could the authors please explain why they chose not to calculate concentrations for VOCs? Perhaps it is that B73 is a very unique variety in that it contains very high levels of TMTT, even in control plants? This should be clarified by the authors.

      Response: We address this comment in the public review portion

      For the legend within Figure 2, I would move it to be in the upper left or right corners of the figure. It is not easy to see in its current position.

      Response: We have moved the figure legend based on the reviewers recommendation

      Figures depicting PTR-ToF-MS data: add m/z values to either the figures themselves and/or the legends.

      Response: We have added m/z values to the legends and added molecular formulas of protonated compounds to each panel.

      Overall, here are some other suggestions: I am slightly weary of the term "clocked response". I'm not sure this is the correct fit for what you are trying to convey. I think "regulated" is a better term than "clocked". I understand that it is likely a stylistic choice to use this word, however, I advise reconsidering for the sake of clarity of the results.

      Response: Thank you. We find clocked to be an appropriate term, as it highlights the temporal aspect of the burst, and have thus left the title as is.

      Have another look at the references as some are not in the correct format (i.e., species not in italics).

      Response: We have checked and corrected the references

      Reviewer #2 (Recommendations For The Authors):

      Line 23: consequences on community dynamics are not investigated here, so this is a bit misleading.

      Last sentence of the abstract: It would be nice to read the answer to this long-standing question here.

      Response: We have deleted he sentence about community dynamics and provided a more concrete final sentence (L38-L40)

      Lines 48-50: The example does not fit so well with the first sentence and is not entirely clear (relation to temporal dynamics; similar to what?).

      Response: We have reworded the sentence for clarity (L49-L52)

      Line 56: "volatiles" should be plural.

      Response: Changed (L58)

      Line 58: "to be produced" rather than "to produce"

      Response: This seems a stylistic choice, and have left it as is.

      End of abstract: Did you have any hypotheses? These should be stated here.

      Response: The listing of hypotheses is also a stylistic choice, which is in some cases required by journals, but not eLife. As such we have not included a discrete list of hypotheses and instead describe what we aimed to investigate and what we found.

      Line 93: "This response disappeared at night." Does this mean: "No volatiles were emitted during night"? Or was this a gradual disappearance? How many hours after the onset of light did the herbivore treatment start and how many hours after the first emission of volatiles was the light turned off?

      Response: We have added when herbivory began (L92-L93) and changed the text to ‘as soon as light was restored’ (L97-L98).

      Line 93: "as soon as the night was over" means practically rather "as soon as the light was switched on".

      Response: See above

      Line 91: "small induction" - do you mean "low amounts of xxx"?

      Response: We mean a small induction. Terpene emission is relatively low (hence small), but still induced relative controls.

      Line 91: which mono- and sesquiterpenes were monitored?

      Response: It is PTR-ToF-MS a thus we cannot identify individual sesquiterpenes and monoterpenes (as they all have the same mass), and thus group them generally.

      Figure 1: What exactly is the "control"? And what does the vertical hatched line in the beginning represent?

      Response: We have defined the control and added a sentence describing the vertical hatched line

      "Black points represent the same but with undamaged sender plants" - what is "the same" here? I find that a bit confusing!

      Response: We have rephrased

      Line 104: how do you define an "overaccumulation"?

      Response: We have added ‘above daytime levels’ to clarify that we mean over daytime levels (L106)

      Why was the oldest developing leaf chosen? Is this the largest one when plants are two weeks old? How many leaves do they have then? Is this the leaf with the highest biomass?

      Response: We chose this leaf as it is the largest and also highly responsive to HIPVs. We have added this sentence (with a reference) in the methods section (L369-L370)

      Line 107: "started increasing after 3 hours" - they may already have started before. The following description also sounds like the dynamics were investigated here. However, instead the authors measured samples at four distinct time-points and cannot say whether something "began" or "remained" etc. The wording should be changed to a more appropriate description, describing the differences at a given time-point.

      Response: We changed the wording to ‘were marginally induced after 3 hr’ see L110

      Line 113: What do you mean by "delete BELOW NIGHTTIME levels"?

      Response: The word we used was ‘deplete’ to ‘drop’ (L116)

      Line 114: "the expression of terpene synthases" add "in the receiver plants exposed to HIPVs."

      Response: Added

      Figure 2ff: The situation of receiver plants exposed to control plant volatiles is not explained in the method section and also not depicted in the Suppl. Fig. 1. Here, the sender plants seem to always have been induced (if the red star-like structure should resemble an induction - a legend may be helpful here).

      Response: We have changed to ‘connected to undamaged sender plants’. We additionally added a sentence to the methods section describing controls L300

      Line 140: This treatment is not described in the methods section. Were the plants only kept under constant conditions for the 2 experimental days? Compared to the induction shown in Fig. 1, the amount of released volatiles seems less here.

      Response: We have added explanation of this to the figure legends, explaining that plants were grown under normal conditions and lights were only left on following treatment

      Another helpful control would be to start the herbivory treatment in the evening hours and leave the light on. If then again plants only release volatiles after a 17 h delay, the response is indeed independent of the diurnal clock of the plant.

      Response: See public review comment. We have added this experiment and discuss it accordingly in the MS (L159-L166 and L207-L211)

      Line 157: Check sentence/grammar

      Response: Checked and modified

      Figure 5: I suggest using a different colour for volatiles released from the sender plants, not again the green also used in the other figures for the receiver plants. This would help the reader to quickly see which plants are in focus in each figure.

      Response: We have changed the color of the figures for clarity

      Figure 6 legend: check grammar in several sentences (use of singular vs. plural)

      Response: We have made the tense uniform

      The diurnal rhythm of jasmonates (and potentially also terpene synthases?) is not considered in the discussion.

      Response: See above, and we have added a sentence to the discussion mentioning the jasmonates (L254-L257)

      Line 230-231: check grammar. Given the complexity, the response pattern may not be so predictable.

      Response: We do not understand this comment, but have checked the grammar throughout the manuscript.

      Line 235: I like the discussion on potential ecological consequences.

      While some interpretation for each experiment is already given in the results section, not all results are discussed in the discussion section. For example, the jasmonate data are not discussed. This should be added.

      Response: See above

      Line 266: To get an idea about the plant size: How many leaves do the plants have in that stage?

      Response: Added a sentence describing the size L287-L288

      Line 321: change to "as in the greenhouse"

      Response: Changed

      Line 334: How were the terpenoids identified and, in particular, quantified?

      Response: Added (L379-L380)

      Line 354: Maybe rather change to: "Plant treatments and tissue collection for phytohormone sampling were identical as described above for terpene and gene expression analysis.

      Response: Changed

      Line 357: add "material" or "leaf tissue" after "flash frozen"

      Response: Added

      Line 359: What was the source of the isotopically labelled phytohormones?

      Response: Added (L400-L403)

      Line 360: The phytohormones are "analyzed" using UPLC. The "quantification" is then done afterward. Please correct.

      Response: Corrected (L404)

      Overall: a great approach and a wonderful idea!

      Thanks

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript investigates the role of membrane contact sites (MCSs) and sphingolipid metabolism in regulating vacuolar morphology in the yeast Saccharomyces cerevisiae. The authors show that tricalbin (1-3) deletion leads to vacuolar fragmentation and the accumulation of the sphingolipid phytosphingosine (PHS). They propose that PHS triggers vacuole division through MCSs and the nuclear-vacuolar junction (NVJ). The study presents some solid data and proposes potential mechanisms underlying vacuolar fragmentation driven by this pathway. However, there are some concerns regarding the strength and interpretation of their lipid data, and the robustness of some conclusions. The manuscript would benefit from addressing these concerns and providing more conclusive evidence to support the proposed conclusions. Overall, the study provides valuable insights into the connection between MCSs, lipid metabolism, and vacuole dynamics, but further clarification will be highly valuable to strengthen the conclusions.

      We thank the thoughtful and positive feedback from Reviewer #1. Nevertheless, there are concerns raised regarding the strength and interpretation of the lipid data, as well as the robustness of specific conclusions. We acknowledge the importance of addressing the raised concerns and provide more conclusive evidence to support our proposed conclusions. We have responded in the "Recommendations to Authors" section and hope that our research has been further strengthened.

      Reviewer #2 (Public Review):

      This manuscript investigates the mechanism behind the accumulation of phytosphingosine (PHS) and its role in triggering vacuole fission. The study proposes that membrane contact sites (MCSs) are involved in two steps of this process. First, tricalbin-tethered MCSs between the endoplasmic reticulum (ER) and the plasma membrane (PM) or Golgi modulate the intracellular amount of PHS. Second, the accumulated PHS induces vacuole fission, most likely via the nuclear-vacuolar junction (NVJ). The authors suggest that MCSs regulate vacuole morphology through sphingolipid metabolism.

      While some of the results in the manuscript are interesting the overall logic is hard to follow. In my assessment of the manuscript, my primary concern lies in its broad conclusions which, in my opinion, exceed the available data and raise doubts. Here are some instances where this comes into play for this manuscript:

      We greatly appreciate the careful insights into our research from Reviewer #2. We have sincerely addressed the points one by one in the following.

      Major points for revision

      1) The rationale to start investigating a vacuolar fission phenotype in the beginning is very weak. It is basically based on a negative genetic interaction with NVJ1. Based on this vacuolar fragmentation is quantified. The binning for the quantifications is already problematic as, in my experience, WT cells often harbor one to three vacuoles. How are quantifications looking when 1-3 vacuoles are counted as "normal" and more than 3 vacuoles as "fragmented"? The observed changes seem to be relatively small and the various combinations of TCB mutants do not yield a clear picture.

      The number of vacuoles at a steady state could be influenced by various environmental factors, including the composition of the medium (manufacturer supplying the reagent and local water hardness) and the background of the strain. Possibly due to those causes, our observations differ from the experience of Reviewer #2. Indeed, we observed that WT cells always have one vacuole in YPD medium. Whereas in SD medium (Fig S3B only), WT cells have mainly one or two vacuoles per cell. In both cases, we observed that some of the mutants showed a different phenotype from the WT and that those differences are supported by student’s t-test and two-way ANOVA analysis.

      2) The analysis of the structural requirements of the Tcb3 protein is interesting but does not seem to add any additional value to this study. While it was used to quantify the mild vacuolar fragmentation phenotype it does not reoccur in any following analysis. Is the tcb3Δ sufficient to yield the lipid phenotype that is later proposed to cause the vacuolar fragmentation phenotype?

      We do not know whether tcb3Δ alone is sufficient to increase PHS as we have not examined it. Nevertheless, as another approach, we analyzed the difference in IPC level between tcb1Δ2Δ3Δ triple deletion and tcb3Δsingle deletion in a sec18 mutant background and showed that the reduction of IPC synthesis is similar between tcb1Δ2Δ3Δand tcb3Δ alone (unpublished). This result suggests that out of all tricalbins (Tcb1, Tcb2 and Tcb3), Tcb3 plays a central role. In addition, the IPC synthesis reduction phenotype was small in tcb1Δ alone and tcb2Δ alone, but a strong phenotype appeared in the tcb1Δtcb2Δ combined deletion (as strong as in tcb3Δ alone). The relationship between Tcb1 Tcb2 and Tcb3 indicated by these results is also consistent with the results of the structural analysis in this study. We have shown that Tcb3 physically interacts with Tcb1 and Tcb2 by immunoprecipitation analysis (unpublished). In the future, we plan to investigate the relationship between Tcb proteins in more detail, along with the details of the interactions between Tcb1, Tcb2, and Tcb3.

      3) The quantified lipid data also has several problems. i) The quantified effects are very small. The relative change in lipid levels does not allow any conclusion regarding the phenotypes. What is the change in absolute PHS in the cell. This would be important to know for judging the proposed effects. ii) It seems as if the lipid data is contradictory to the previous study from the lab regarding the role of tricalbins in ceramide transfer. Previously it was shown that ceramides remain unchanged and IPC levels were reduced. This was the rationale for proposing the tricalbins as ceramide transfer proteins between the ER and the mid-Golgi. What could be an explanation for this discrepancy? Does the measurement of PHS after labelling the cells with DHS just reflect differences in the activity of the Sur2 hydroxylase or does it reflect different steady state levels.

      i) As Reviewer #2 pointed out, it is a slight change, but we cannot say that it is not sufficient. We have shown that PHS increases in the range of 10~30% depending on the concentration of NaCl that induces vacuole division (This result is related to the answers to the following questions by Reviewer #3 and to the additional data in the new version). This observation supports the possibility that a small increase in PHS levels may have an effect on vacuole fragmentation. We did not analyze total PHS level by using methods such as liquid chromatography-mass spectrometry or ninhydrin staining of TLC-separated total lipids. The reason for this is that radiolabeling of sphingolipids using the precursor [3H]DHS provides higher sensitivity and makes it easier to detect differences. Moreover, using [3H]DHS labeling, we only measure PHS that is synthesized in the ER and that doesn’t originate from degradation of complex sphingolipids or dephosphorylation of PHS-1P in other organelles.

      ii) In our previous study (Ikeda et al. iScience. 2020), we separated the lipid labeled with [3H]DHS into ceramides and acylceramides. There was no significant change in ceramide levels, but acylceramides increased in tcb1Δ2Δ3Δ. Since we did not separate these lipids in the present study, the data shows the total amount of both ceramide and acylceramide. We apologize that the term in Figure 3A was wrong. We have corrected it. Also, we have used [3H]DHS to detect IPC levels, which differs from the previous analysis used [3H]inositol. This means the lipid amounts detected are completely different. Since the amount of inositol incorporated into cells varies from cell to cell, the amount loaded on the TLC plate is adjusted so that the total amount (signal intensity) of radioactively labeled lipids is almost the same. In contrast, for DHS labeling, the amount of DHS attached to the cell membrane is almost the same between cells, so we load the total amount onto the TLC plate without adjustment. In addition, the reduction in IPC levels due to Tcb depletion that we previously reported was seen only in sec12 or sec18 mutation backgrounds, and no reduction in IPC levels was observed in the tcb1Δ2Δ3Δ by [3H]inositol labeling (Ikeda et al. iScience. 2020). Therefore, we cannot simply compare the current results with the previous report due to the difference in experimental methods.

      The labeling time for [3H]DHS is 3 hours, and we are not measuring steady-state amounts, but rather analyzing metabolic reactions. Since [3H]DHS is converted to PHS by Sur2 hydroxylase in the cell, the possibility that differences in PHS amounts reflect differences in Sur2 hydroxylase activity cannot be ruled out. However, this possibility is highly unlikely since we have previously observed that the distribution of ceramide subclasses is hardly affected by tcb1Δtcb2Δtcb3Δ (Ikeda et al. iScience 2020). We have added to the discussion that the possibility of differences in Sur2 hydroxylase activity cannot be excluded.

      4) Determining the vacuole fragmentation phenotype of a lag1Δlac1Δ double mutant does not allow the conclusion that elevated PHS levels are responsible for the observed phenotype. This just shows that lag1Δlac1Δ cells have fragmented vacuoles. Can the observed phenotype be rescued by treating the cells with myriocin? What is the growth rate of a LAG1 LAC1 double deletion as this strain has been previously reported to be very sick. Similarly, what is the growth phenotype of the various LCB3 LCB4 and LCB5 deletions and its combinations.

      As Reviewer #2 pointed out, the vacuolar fragmentation in lag1Δlac1Δ itself does not attribute to the conclusion that increased PHS levels are the cause. Since this mutant strain has decreased level of ceramide and its subsequent product IPC/MIPC in addition to the increased level of the ceramide precursors LCB or LCB-1P, we have changed the manuscript as follows. As noted in the following comment by reviewer #2, myriocin treatment has been reported to induce vacuolar fragmentation, so we do not believe that experiments on recovery by myriocin treatment will lead to the expected results.

      ・ Previous Version: We first tested whether increased levels of PHS cause vacuolar fragmentation. Loss of ceramide synthases could cause an increase in PHS levels. Our analysis showed that vacuoles are fragmented in lag1Δlac1Δ cells, which lack both enzymes for LCBs (DHS and PHS) conversion into ceramides (Fig 3B). This suggests that ceramide precursors, LCBs or LCB-1P, can induce vacuolar fragmentation.

      ・Current Version: We first evaluated whether the increases in certain lipids are the cause of vacuolar fragmentation in tcb1Δ2Δ3Δ. Our analysis showed that vacuoles are fragmented in lag1Δlac1Δ cells, which lack both enzymes for LCBs (DHS and PHS) conversion into ceramides (Fig 3B). This suggests that the increases in ceramide and subsequent products IPC/MIPC are not the cause of vacuolar fragmentation, but rather its precursors LCBs or LCB-1P.

      As reviewer #2 pointed out, the lag1Δlac1Δ double mutant is very slow growing as shown below (Author response image 1). We also examined the growth phenotype of LCB3, LCB4, and LCB5 deletion strains, and found that the growth of these strains was the same as the wild strains, with no significant differences in growth (Author response image 1).

      Author response image 1.

      Cells (FKY5687, FKY5688, FKY36, FKY37, FKY33, FKY38) were adjusted to OD 600 = 1.0 and fivefold serial dilutions were then spotted on YPD plates, then incubated at 25℃ for 3 days.

      5) The model in Figure 3 E proposes that treatment with PHS accumulates PHS in the endoplasmic reticulum. How do the authors know where exogenously added PHS ends up in the cell? It would also be important to determine the steady state levels of sphingolipids after treatment with PHS. Or in other words, how much PHS is taken up by the cells when 40 µM PHS is added?

      It has been found that the addition of PHS well suppresses the Gas1 trafficking (Gaigg et al. J Biol Chem. 2006) and endocytosis phenotypes in lcb-100 mutants (Zanolari et al. EMBO J. 2000). Their suppression depends on Lcb3 localized to the ER. Thus, we know that PHS added from outside the cell reaches the ER and is functional.

      We also agree that it is important to measure the amount of PHS taken up into the cells. However, this is extremely difficult to do for the following reasons. The majority of PHS added to the medium remains attached to the surface layer of the cells. If we measure the lipids in the cells by MS, we would detect both lipids present on the outside and inside of the plasma membrane. This means we need to separate the outside from the inside of the cell's membrane to determine the exact amount of LCB that has taken up by the cells. Regretfully, this separation is currently technically difficult.

      6) Previous studies have observed that myriocin treatment itself results in vacuolar fragmentation (e.g. Hepowit et al. biorXivs 2022, Fröhlich et al. eLife 2015). Why does both, depletion and accumulation of PHS lead to vacuolar fragmentation?

      It’s exactly as Reviewer #2 said. Consistent with previous results with myriocin treatment, we also observed vacuolar fragmentation in the lcb1-100 mutant strain. Then we have added these papers to the references for further discussion. Our discussion is as follows.

      "Previous studies have observed that myriocin treatment results in vacuolar fragmentation (Hepowit et al. bioRxiv 2022; Now published in J Cell Sci. 2023, Fröhlich et al. eLife 2015). Myriocin treatment itself causes not only the depletion of PHS but also of complex sphingolipids such as IPC. This suggests that normal sphingolipid metabolism is important for vacuolar morphology. The reason for this is unclear, but perhaps there is some mechanism by which sphingolipid depletion affects, for example, the recruitment of proteins required for vacuolar membrane fusion. In contrast, our new findings show that both PHS increase and depletion cause vacuole fragmentation. Taken together, there may be multiple mechanisms controlling vacuole morphology and lipid homeostasis by responding to both increasing and decreasing level of PHS."

      7) The experiments regarding the NVJ genes are not conclusive. While the authors mention that a NVJ1/2/3 MDM1 mutant was shown to result in a complete loss of the NVJ the observed effects cannot be simply correlated. It is also not clear why PHS would be transported towards the vacuole. In the cited study (Girik et al.) the authors show PHS transport from the vacuole towards the ER. Here the authors claim that PHS is transported via the NVJ towards the vacuole. Also, the origin of the rationale of this study is the negative genetic interaction of tcb1/2/3Δ with nvj1Δ. This interaction appears to result in a strong growth defect according to the Developmental Cell paper. What are the phenotypes of the mutants used here? Does the additional deletion of NVJ genes or MDM1 results in stronger growth phenotypes?

      We seriously appreciate the concerns in our research. As reviewer #2 pointed out, we have not shown evidence in this study to support that PHS is transported directly from the ER to the vacuole, so it is unclear whether PHS is transported to the vacuole and its physiological relevance. Girik et al. showed that the NVJ resident protein Mdm1 is important for PHS transport between vacuole and ER. Given the applied experimental method that tracks PHS released in the vacuole, indeed only transport of PHS from the vacuole to the ER was verified. However, assuming that Mdm1 transports PHS along its concentration gradient we consider that under normal conditions, PHS is transported from the ER (as the organelle of PHS synthesis) to the vacuole. We clarified this interpretation by adding the following sentences to the manuscript at line 313:

      “The study applied an experimental method that tracks LCBs released in the vacuole and showed that Mdm1p is necessary for LCBs leakage into the ER. However, assuming that Mdm1p transports LCBs along its concentration gradient we consider that under normal conditions, LCBs is transported from the ER (as the organelle of PHS synthesis) to the vacuole.”

      The negative genetic interaction between tcb1/2/3Δ and nvj1Δ is consistent with this model, but under our culture conditions we did not observe a negative interaction between the genes encoding the TCB3 and NVJ junction proteins (Author response image 2). We do not know if this is due to strain background, culture conditions, or whether the deletions of TCB1 and TCB2 are also required for the negative interaction. We would like to analyze details in the future.

      Author response image 2.

      Cells (FKY 3868, FKY5560, FKY6187, FKY6189, FKY6190, FKY6188, FKY6409) were adjusted to OD 600 = 1.0 and fivefold serial dilutions were then spotted on YPD plates, then incubated at 25℃ for 3 days.

      Our results in this study show that deletion of the NVJ component gene partially suppresses vacuolar fission upon the addition of PHS. To clarify these facts, we have changed the sentences in Results and Discussion of our manuscript as follows. We hope that this change will avoid over-interpretation.

      ・ Previous: To test the role of NVJ-mediated “transport” for PHS-induced vacuolar fragmentation,

      ・Current: To test the role of NVJ-mediated “membrane contact” for PHS-induced vacuolar fragmentation,

      ・Previous: Taken together, we conclude from these findings that accumulated PHS in tricalbin deleted cells triggers vacuole fission via “non-vesicular transport of PHS” at the NVJ.

      ・Current: Taken together, we conclude from these findings that accumulated PHS in tricalbin deleted cells triggers vacuole fission via “contact between ER and vacuole” at the NVJ.

      ・Previous: Because both PHS- and tricalbin deletion-induced vacuolar fragmentations were partially suppressed by the lack of NVJ (Fig 4B, 4C), it is suggested that transport of PHS into vacuoles via the NVJ is involved in triggering vacuolar fragmentation.

      ・Current: Based on the fact that both PHS- and tricalbin deletion-induced vacuolar fragmentations were partially suppressed by the lack of NVJ (Fig 4B, 4C), it is possible that the trigger for vacuolar fragmentation is NVJ-mediated transport of PHS into the vacuole.

      8) As a consequence of the above points, several results are over-interpreted in the discussion. Most important, it is not clear that indeed the accumulation of PHS causes the observed phenotypes.

      We thank the suggestion by Reviewer #2. In particular, the concern that PHS accumulation really causes vacuolar fragmentation could only be verified by an in vitro assay system. This is an important issue to be resolved in the future.

      Reviewer #3 (Public Review):

      In this manuscript, the authors investigated the effects of deletion of the ER-plasma membrane/Golgi tethering proteins tricalbins (Tcb1-3) on vacuolar morphology to demonstrate the role of membrane contact sites (MCSs) in regulating vacuolar morphology in Saccharomyces cerevisiae. Their data show that tricalbin deletion causes vacuolar fragmentation possibly in parallel with TORC1 pathway. In addition, their data reveal that levels of various lipids including ceramides, long-chain base (LCB)-1P and phytosphingosine (PHS) are increased in tricalbin-deleted cells. The authors find that exogenously added PHS can induce vacuole fragmentation and by performing analyses of genes involved in sphingolipid metabolism, they conclude that vacuolar fragmentation in tricalbin-deleted cells is due to the accumulated PHS in these cells. Importantly, exogenous PHS- or tricalbin deletion-induced vacuole fragmentation was suppressed by loss of the nucleus vacuole junction (NVJ), suggesting the possibility that PHS transported from the ER to vacuoles via the NVJ triggers vacuole fission.

      This work provides valuable insights into the relationship between MCS-mediated sphingolipid metabolism and vacuole morphology. The conclusions of this paper are mostly supported by their results, but there is concern about physiological roles of tricalbins and PHS in regulating vacuole morphology under known vacuole fission-inducing conditions. That is, in this paper it is not addressed whether the functions of tricalbins and PHS levels are controlled in response to osmotic shock, nutrient status, or ER stress.

      We appreciate the comment, and we consider it an important point. To answer this, we have performed additional experiments. Please refer to the following section, "Recommendations For The Authors" for more details. These results and discussions also have been added to the revised Manuscript. We believe this upgrade makes our findings more comprehensive.

      There is another weakness in their claim that the transmembrane domain of Tcb3 contributes to the formation of the tricalbin complex which is sufficient for tethering ER to the plasma membrane and the Golgi complex. Their claim is based only on the structural simulation, but not on biochemical experiments such as co-immunoprecipitation and pull-down.

      We appreciate your valuable suggestion and would like to attempt to improve upon it in the future.

      Author response to Recommendations:

      The following is the authors' response to the Recommendations For The Authors. We have now incorporated the changes recommended by Reviewers to improve the interpretations and clarity of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      I would recommend the authors provide additional experimental data to fully support their claims or revise the writing of their manuscript to be more precise in their conclusions. In particular, I have suggestions/questions:

      Fig. 1A: display the results as in 1B (that is, different colors for different number of vacuoles, and the x axes showing the different conditions, in this case WT vs tcb1∆2∆3∆.

      In response to the suggestion of Reviewer #1, we have changed the display of results.

      Fig. S1B: the FM4-64 pattern looks different in the KO strain as compared to those shown in Fig. 1A. Is there a reason for that? Also, no positive control of cps1p not in the vacuole lumen is shown.

      Our apologies, this was probably due to the poor resolution of the images. We have made other observations and changed the Figure along with the positive control.

      Line 172: the last condition in Fig. 2B (vi), should be compared to the tcb1∆tcb2∆ condition (shown in fig 1).

      In response to the suggestion of Reviewer #1, we have changed the manuscript as follows: We found that cells expressing Tcb3(TM)-GBP and lacking Tcb1p and Tcb2p (Fig 2B (vi)) are even more fragmented than tcb1Δ2Δ in Fig 1B and are fragmented to a similar degree as tcb3Δ (Fig 1B and Fig 2B (ii)).

      Fig 2E: the model shown here can be tested, is there binding (similar to kin recognition mechanism of some Golgi proteins) between the different Tcb TMDs?

      As Reviewer #1 mentioned, we have confirmed by co-immunoprecipitation that Tcb3 binds to both Tcb1 and Tcb2 (unpublished). Furthermore, we will test if the binding can be observed with TMD alone in the future.

      Fig 3A: you measured an increase in PHS that is metabolized from DHS (which is what you label). Are there other routes to produce PHS independently of DHS? I mean, how is the increase reporting on the total levels of this lipid?

      PHS synthesized by Sur2 is converted to PHS-1P and phytoceramide. Conversely, PHS is reproduced by degradation of PHS1-P via Lcb3, Ysr3, and by degradation of phytoceramides via Ypc1 (Vilaça, Rita et al. Biochim Biophys Acta Mol Basis Dis. 2017. Fig1). Our analysis shows that these degradation substrates are not decreasing but rather accumulating in tcb1Δ2Δ3Δ strain, suggesting that the degradation system is not promoting PHS level. Therefore, the increase in detected PHS is most likely due to congestion/jams in metabolic processes downstream of PHS. Possible causes of the lipid metabolism disruption in Tcbdeletion cells have been discussed in the Discussion. To put it simply, (1) The reduced activity of a PtdIns4P phosphatase Sac1, due to MCS deficiency between ER and PM. (2) The impaired ceramide nonvesicular transport from the ER to the Golgi. (3) The low efficiency of PHS export by Rsb1, due to insufficient PHS diffusion between the ER and the PM.

      Line 248: did the authors test if the NVJ MCS is unperturbed in the triple Tcb KO?

      This is an exciting question. We are very interested in considering whether Tcb deficiency affects NVJ formation in terms of lipid transport. We would like to conduct further analysis in this regard in our future studies.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest carefully evaluating the findings in this manuscript. Right now the connection between elevated PHS levels and vacuolar fragmentation are not really supported by the data. One of the major issues in the field of yeast sphingolipid biology is that quantification of the lipid levels is difficult and labor- and cost-intensive. But I think that it is very important to directly connect phenotypes with the lipid levels.

      Minor points:

      • In figure 1 c and d WT controls of the different treatments are lacking.

      As reviewer #2 had pointed out, we have added data for the WT controls.

      • The tcb1Δmutant appears to be sensitive in pH 5.0 media while the triple tricalbins mutant grows fine. Is that a known phenotype?

      We have performed this assay on SD plates. Then, to check whether this phenotype of tcb1Δ was specific or general, we re-analyzed the same strain in YPD medium. In YPD medium, tcb1Δ strain grew normally, while the control, vma3Δ, was still pH sensitive. Therefore, the growth of this tcb1Δ strain is dependent on the nutrient conditions of the medium but does not appear to be pH sensitive. This new data was inserted as part of Supplementary Figure 1.

      • Line 305. The is an "of" in the sentence that needs to be deleted.

      As pointed out by Reviewer #2, we have corrected the sentence.

      Reviewer #3 (Recommendations For The Authors):

      In supplementary Fig 2, the authors show the involvement of the NVJ in hyperosmotic shockinduced vacuole fission, but the involvement of tricalbins and PHS in this process is not tested. Does osmotic shock affect the level or distribution of tricalbins and PHS? They will be able to test whether overexpression of tricalbins inhibits hyperosmotic shock-induced vacuole fission or not. Also, they will be able to perform the similar experiments upon ER stressinduced vacuole fission.

      We appreciate Reviewer#3 for suggesting that it is important to test the involvement of PHS in hyperosmotic shock- or ER stress-induced vacuole fission. We have shown in a previous report that treatment with tunicamycin, which is ER stress inducer, increased the PHS level by about 20% (Yabuki et al. Genetics. 2019. Fig4). In addition, we tested the effect of hyperosmolarity on PHS levels for this time. Analysis of PHS under hyperosmotic shock conditions (0.2 M NaCl), in which vacuolar fragments were observed, showed an increase in PHS of about 10%. Furthermore, when the NaCl concentration was increased to 0.8 M, PHS levels increased up to 30%. In other words, we have shown that PHS increases in the range of tens of percent depending on the concentration of NaCl that induces vacuole division. This observation supports the possibility that a small increase in PHS levels may have an effect on vacuole fragmentation. Moreover, NaCl-induced vacuolar fragmentation, like that caused by PHS treatment, was also suppressed by PHS export from the cell by Rsb1 overexpression.

      These new data are now inserted, commented and discussed in the manuscript as Figure 5. We hope that these results will provide further insight into the more general aspects of PHS involvement in the vacuole fission process.

      Minor points:

      1) It is unclear for me whether endogenous Tcb3 is deleted in cells expressing Tcb3-GBP (FKY3903-3905 and FKY4754). They should clearly mention that these cells do not express endogenous Tcb3 in the manuscript.

      We apologize that our description was not clear. In this strain, endogenous TCB3 gene is tagged with GBP and the original Tcb3 has been replaced by the tagged version. We have changed the description in our manuscript.

      2) The strength of the effect of PHS on vacuole morphology looks different in respective WT cells in Fig 3C, 4B, and S2B. Is this due to the different yeast strains they used?

      Yes, we used BY4742 background for the strain in Figure 3C, SEY6210 background in Figure 4B, and HR background in Figure S2B. As a matter of fact, we observed that the strength of the PHS effect varies depending on their background. Strain numbers are now given in the legend so that the cells used for each data can be referenced in the strain list.

      3) p.3, line 44: the "SNARE" complex (instead of "protease")?

      We thank for the remarks on the incorrect wording. We have corrected this sentence.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Strengths:

      The major strength of this paper is the series of laser cutting experiments supporting that asters position via pushing forces acting both on the boundary (see below for a relevant comment) and between asters. The combination of imaging, data analysis and mathematical modeling is also powerful.

      Author Response: We thank the Reviewer for the positive comments, especially in recognising the power of our quantitative approaches.

      Weaknesses:

      This paper has weaknesses, mainly in the presentation but also in the quality of the data which do not always support the conclusions satisfactorily (this might in part be a presentation issue).

      Author Response>: We address these concerns below.

      My overall suggestion for the authors is to explain better the motivation and interpretation of their experiments and also to remove some of the observations which seem to be there because they could be done rather than because they add to the main message of the paper, which I find straightforward, valuable and supported by the data in Figure 4.

      Author Response: We have extended the motivation of the study in the Introduction, and at the beginning of appropriate Results sections. We better motivate the force potential and especially the key results from Figure 4. We outline specific changes below.

      In Figure 2, it is difficult for me to understand what is being tracked. I believe that the authors track the yolk granules (visible as large green blobs) and not lipid droplets. There is some confusion between the text, legends and methods so I could not tell. If the authors are tracking yolk granules as a proxy for hydrodynamics flows it seems appropriate to cite previous papers that have used and verified these methods. More notably, this figure is somewhat disconnected with the rest of the paper. I find the analysis interesting in principle but would urge the authors to propose some interpretation of the experiments in the context of their big-picture message. At this point, I cannot understand what the Figure adds.

      Author Response: Indeed, we track the yolk droplets that move around the aster. In the extraction protocol, we likely get a mixture of lipid droplets and yolk granules; this is due to the extraction procedure involving shear forces within the pipette. We are not certain about the exact nature of these droplets, but they are likely to a large extent yolk. We have clarified the terminology in the text, the legend and methods section. In this figure, we now show that the droplets do not move towards the aster center as the hydrodynamic pulling model would suggest. Instead, they appear to passively respond to a repulsive force, that results in them streaming around the aster. We have added additional panels to the figure that illustrates the directionality of yolk granule movements (lines 159-164). We agree with the Reviewer that the context could have been clarified. The role of fluid flows in biological systems is, as the Reviewer highlights, well studied. We have added additional contextualisa8on in the text (lines 140-146). We also motivate more clearly the figure, as it provides evidence that the asters generate forces over 20µm scale (lines 159-164). This is highly relevant for one of the paper’s main conclusions – that the Drosophila blastocyst asters generate pushing forces that enable regular packing.

      In Figure 3, it is not surprising that the aster-aster interactions are different from interactions with the boundary which is likely more rigid. It is also hard to understand why the force and thus velocity should scale as microtubule length. This Figure should be better conceptualized. I think that it becomes clear at the end of the paper that the authors are trying to derive an effective potential to use in a mathematical model in Figure 5 to test their hypotheses. I think that should be told from the start, so a reader understands why these experiments are being shown.

      Author Response: We don’t claim that the force scales with microtubule length on a single microtubule. However, at larger distances from the aster, the microtubule density decreases, and hence the effective force decreases.

      The Reviewer is correct that we use these results to motivate our effective potential. We have brought this motivation forward in the manuscript to guide the reader (lines 169-171) and included a further note at the end of the section (lines 216-218).

      The experiments in Figure 4 are very nice in suppor8ng a pushing model. However, it would help if the authors could speculate what the single aster is pushing against in this experiment. The experiments reported in Figure 1 seemed to suggest that the aster mainly pushed against the boundary. In the experiments in Figure 4 do the individual asters touch the boundary on both sides? I think that readers need more information on what the extract looks like for those experiments.

      Author Response: We now include an additional panel B in Figure 4– that shows an example of an explant during aster ablation. The distance between asters is typically less than the distance to the explant boundary. Boundary effects likely play a small role in the aster-aster separation, in terms of potentially determining the axis of separation. However, the separation of asters occurs along a straight line for a substan8al period (>1 min) of separation; if boundary effects were more dominant, we may expect to see curving of the aster-aster separation trajectories as they also receive feedback from the boundary.

      Figure 4F could use some statistics. I doubt that the acceleration in the pink curves would be significant. I believe that the decelera8on is and that is probably the most crucial result. Since the authors present only 3 asters pairs it is important to be sure that these conclusions are solid.

      Author Response: We agree with the Reviewer. These experiments are challenging to do, as they require carefully controlled conditions. In two out of three experiments we see significant increase in acceleration in the pink curves. Of course, the interpretation of this must be caveated as our experimental number is low. These details are now provided in the revision (lines 263267).

      Reviewer 2

      Strengths:

      This study reveals a unique aster positioning mechanics in the syncytial embryo explant, which leads to an understanding of the mechanism underlying the positioning of multiple asters associated with nuclei in the embryo. The use of explants enabled accurate measurement of aster motility and, therefore, the construc8on of a quantitative model. This is a notable achievement.

      Author Response: We thank the Reviewer for their review, and in highlighting how our quantitative model is a clear step forward in our understanding of aster dynamics.

      Weaknesses:

      The main conclusion that aster repulsion predominates in this system has already been drawn by the same authors in their recent study (de-Carvalho et al., Development, 2022). As the present work provides additional support to the previous study using different experimental system, the authors should emphasize that the present manuscripts adds to it (but the conceptual novelty is limited).

      Author Response: While this study is related to the previous work, there are major differences. First, here we quantitatively assess aster dynamics within a “clean” system. Such accurate measurements are not possible in vivo currently. Further, experiments like laser ablation are much better defined within the explant system. We do recognise more clearly the previous work in the Introduc8on and lines 291-293, 299-300. Combined, with the different perspectives provided in these papers on the problem of aster positioning in syncytia, we believe these papers provide new and well-supported insights.

      The molecular mechanisms underlying aster repulsion remain unexplored since the authors were unable to identify specific factor(s) responsible for aster repulsion in the explant.

      Author Response: Given that the nature of the aster dynamics were not previously characterised, our work presents a major step forward. We show compelling evidence that an effective pushing force potential plays a role in aster interactions. With this critical knowledge, we can now explore for the potential molecular mechanisms – but such information lies beyond the current manuscript scope. This is particularly challenging due to the lack of specific microtubule drug inhibitors in Drosophila. We highlight related issues in the Discussion: paragraph starting on line 340 and lines 367-370.

      Specific suggestions:

      Microtubules should be visualized more clearly (either in live or fixed samples). This is particularly important in Figure 4E and Video 4 (laser ablation experiment to create asymmetric asters).

      Author Response: This is similar to Reviewer 1 final comment above. These experiments are very challenging and being able to see the microtubules with sufficient clarity is not straightforward. Given our controls and previous experience, we are confident we are ablating the microtubules.

      Minor points:

      1) The authors explain the roles of microtubule asters in several model systems in the first paragraph of the introduction part. Please specify the species and/or cell types in each description.

      Author Response: We have provided as suggested.

      2) In lines 164 and 172, the citing figure numbers should be modified to Supplementary Fig. 1A and 1B, respectively.

      Author Response: We thank the Reviewer for spotting this error. It has now been corrected.

      3) The authors showed in the previous study that the boundary in the explant does not have an intact cell cortex and f-actin compartments (de-Carvalho et al., Development, 2022). This important informa8on should also be described in the current manuscript. It is also valuable to mention whether the pulling force mechanism operates in embryos where the intact cell cortex is present.

      Author Response: This is an interesting point We have added a sentence in the discussion with this information. We have now added additional text in the Discussion (lines 324-327).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      It is somewhat speculative that the structure represents the EIIa-bound regulatory state. There's a strong enough case that it should be analyzed in the discussion, but I don't think it is firmly established. Therefore, the title of the paper should be changed.

      Our answer: Thank you for the comment. We have changed the title to “Mobile barrier mechanisms for Na+-coupled symport in an MFS sugar transporter”

      Reading through the manuscript, it was challenging to distinguish what is new in the current manuscript and what has been done previously. There were a lot of parts where it was hard for me to identify the main point of the current study among all the details of previous studies. It would also benefit from shortening. For example:

      -Page 6: Nb725 binding has already been characterized extensively in the very nice JBC paper earlier this year. It's important to test 725-4 for binding, but since it doesn't change the binding interaction, and probably wouldn't be expected to, the entire section could be written more succinctly. The main point, which is that 725-4 behaves like 725, is lost among all the details

      Our answer: Thanks for this instructive suggestion. We have shortened the description in this section.

      -Page 9-10. I don't understand what summarizing all of the results from the previous D59C studies adds to the current story. It's important because it provides an indication of the substrate binding site, but its mechanism of action does not seem relevant to the current work.

      Our answer: We have shortened the description of the sugar-binding site and moved the previous Fig. 3b to supplementary figure sFig. 11. According to your comment about showing the location of the binding sites, which is also suggested by Reviewer #2, we modified Fig. 3 and added two panels to map the location of the bound Na+ in the inward-facing structure and the bound sugar in the outward-facing structure.

      The sugar-binding site identified in the published structure is critical to construct the mobile barrier mechanism. The sugar-binding residues identified in the published structure provided essential data to support the conclusion that the sugar-binding pocket is broken in the inward-facing structure. Thus, this published structure is mechanistically relevant to the current study.

      -Page 12. Too much summary of the previous outward structure. Since this is already part of the literature, it would be more efficient to reference the previous data when it is important to interpret the new data (or show as a figure).

      Our answer: The introduction of the previous sugar-binding sit is important for the detailed comparison between the two states as discussed above, but we agree with this reviewer and have significantly shortened the paragraph by moving the detailed description into the legend to the sFig. 11.

      -Instead of providing the PDB ID in figures of the current structure, just say "current work" or similar. Then it is obvious you are not citing a previous structure.

      Our answer: To distinguish clearly the new data and published results, the citation of the cryoEM structure [PDP ID 8T60] has been completely removed from the main text but kept in sTable 1.

      -An entire panel of Figure 3 is dedicated to ligand binding in a previous outward-facing structure.

      Showing it in the overlay would be sufficient.

      Our answer: It is the first time for us to show a structure with a bound-Na+. Fig. 3 also illustrates the spatial relationship between the sugar-binding pocket and the cation-binding pocket since both binding sites are determined now. As stated above, according to two reviewers’ comments, we have modified the Figures and the Fig. 3d is the overlay.

      Please increase the size of the font in all figures. It should be 6-8 point when printed on a standard sheet of paper. Labels in Figure 3, distances in Figure 4, and everything in Figure 5 is hard to see.

      Our answer: Thank you for the comments and the enlargement of the figure size and label font in all figures have been made.

      Figure 2: would be helpful to show Figure S8 in the main text, orienting the reader to the approximate location of substrate binding. What is known about the EIIA-Glc binding interface? Has anyone probed this by mutagenesis? Where are these residues on the overall structure, and are they somewhere other than the nanobody interface?

      Our answer: Thank you for this comment. We have added a panel for orienting the readers about the substrate location in MelB in Figure 3c. The sFig. 8 actually focuses on the details of Nb interactions with MelB. Our current data strongly supported the notion that the Nb-bound MelBSt structure mimics the EIIAGlc-bound MelB but is not structurally resolved, so we have tuned down our statement on EIIAGlc. There is one study suggesting the C-terminal tail helix may be involved in the EIIAGlc binding, which has been added to the discussion.

      Can Figure 5 be split into 2 figures and simplified?

      Our answer: thanks for the suggestion. We have split it into Figs. 5b and 6 and also moved the peptide mapping to the Fig 5a.

      What is the difference between cartoon and ribbon rendering?

      Our answer: Ribbon: illustrating the structure; cartoon: highlighting the positions with statistically significant protection or deprotection. The statistically significant changes are implied by the ribbon representation; Sphere: not covered by labeled peptides.

      Can the panels showing the kinetic data be enlarged? I don't think they need to surround the molecule. An array underneath would be fine.

      Our answer: We have enlarged all figures and labels. The placement of selected plots around the model could clearly show the difference in deuterium uptake rates between the transmembrane domain and extra-membrane regions. We will maintain this arrangement.

      Do colors in panel A correspond with colors in panel B?

      Our answer: The color usage in both are different. Now the two panels have been separated.

      Do I understand correctly that in the HDX experiments, negative values indicate positions that exchange more quickly in the nanobody-free protein relative to the nanobody-bound protein?

      Our answer: Your understanding is correct.

      I assume some of this is due to the protein changing conformation, but some of it might be due to burial at the nanobody-binding interface. Can those peptides be indicated?

      Our answer: Thank you for this comment. We have marked the peptide carrying the Nb-binding residues on uptake plots in Figs.6 and Extended Fig. 1. There are only three Nb-binding residues covered by many overlapping peptides. Most are not covered, either not carried by the labeled peptides (Tyr205, Ser206, and Ser207) or with insignificant changes (Pro132 and Thr133), except for Asp137, Lys138, and Arg141 which are presented in 8 labeled peptides.

      Few buried positions in the outward-facing state are expected to be solvent in the inward-facing state; unfortunately, inward-facing state they are buried by Nb binding.

      Make figure legends easier to interpret by removing non-essential methods details (like buffer conditions).

      Our answer: We removed the detailed method descriptions in most figure legends. Thank you.

      Check throughout for typos.

      ie page 9 Lue Leu

      Page 9 like likely

      Our answer: We have corrected them. Thank you!

      Reviewer #2 (Recommendations For The Authors):

      I have mostly minor questions/remarks.

      • Why not do the hdx-ms experiments in the presence of sugar? That would give a proper distinction between two conformational states, instead of an ensemble of states vs one state.

      Our answer: MelB conformation induced by sugar is also multiple states, and likely most are outward-facing states and occluded intermediate states. This is also supported by the new finding of an inward state with low sugar affinity. The ideal design should be one inward and one outward to understand the inward-outward transition. We have not identified an outward-facing mutant while we can obtain the inward by the Nb. WT MelBSt with bound Na+ favors the outward-facing state. Although our design is not ideal, we do have one state vs a predominant outward-facing WT with bound Na+.

      Minor comments:

      • Fig 5 is misleading as the peptide number does not match with the amino acid sequence. I would suggest putting a heat map with coverage on top. Or showing deuterium uptake per peptide. See examples below.

      Our answer: The peptide number should not match with sequence number. We have 155 overlapping peptides that cover the entire amino acid sequence including the 10-His tag, and there are 60 residues with no data because they are not covered by a labeled peptide. The residue positions that are covered by peptides are estimated by bars on the top. The cylinder length does not correspond to the length of the transmembrane helix, just for mapping purposes.

      • Can the authors explain how they found that the Nbs bind to the cytoplasmic side (before obtaining the structure)?

      Our answer: Our in vivo two-hybrid assay between the Nb and MelBSt indicated their interaction on the cytoplasmic surface of MelBSt, which is further confirmed by the melibiose fermentation and transport assay, where the transport activities were completely inhibited by intracellularly coexpressed Nb and MelBSt. Thanks for raising this question.

      • The authors use the word "substrate" indifferently for sugar and Na+ binding, which is a bit confusing. Technically, only sugar is the substrate and Na+ is a ligand, or cotransported-ion, that powers the reaction of transport. This might sound like nit-picking but it can lead to misunderstandings (at some point I thought two sugars were transported, and then I was looking for the second Na+ binding site).

      Our answer: We used to call the sugar and Na as co-substrate but we agree with this comment.

      We have changed by using substrate for the cargo sugar and coupling cation for the driving cation.

      • Abstract "only the inner barrier" - the is missing.

      Thanks. We have corrected this.

      • p.3 intro "and identified that the positive cooperativity of cation and melibiose, " something is missing.

      Thanks again. We missed the “as the core symport mechanism”.

      • P.6 Nb275_4 instead of Nb725_4

      Thank you very much for your careful reading.

      • P.7. Also, affinity affinities

      Thank you very much. We changed to “; and also, the -NPG affinity decreased by 21~32-fold for both Nbs”

      • P.8 " contains 417 MelBSt residues (positions 2-210, 219-355, and 364-432). This does not sum up to 417 residues.

      Thanks for your critical reading. We changed 364-432 to 262-432.

      • p.9 Lue 54

      We have corrected it to Leu54.

      • I find fig.3 hard to read. Can the authors show the Na+ binding pockets and sugar binding pockets within the structure? Especially figure 3b. why are the residues in different colors?

      Our answer: We have moved Fig 3b into sFig. 11. We colored the residues in the previous Fig 3B to match the hosting helices. We have added two panels to show the location of both sugar and Na in the molecular. Thank you for your comments.

      • Fig4 bcef. Colored circles at the end of the helices. What are they for?

      Our answer: We revised the legend. “The paired helices involved in either barrier formation were highlighted in the same colored circles.”

      • 86% coverage includes the his-tag - it would be good to clarify that.

      Our answer: Yes, it includes the 10-His tag.

      • Fig.7 - anti clockwise cycle of transport is counter-intuitive.

      Our answer: We have re-arranged. Our model was constructed originally to explain efflux due to limited information at the earlier state. Now more data are available allowing us to explain inflow and active transport.

      • Where are all the uptake plots per peptide for the HDX-MS data?

      Our answer: We have added the course raw data and prepared all uptake plots for all 71 peptides with statistically significant changes as an Extended Fig. 1.

      • P.22 protein was concentrated to 50 mg/mL. Really? That is a lot.

      This is correct. We can even concentrate MelBSt protein to greater than 50 mg/ml.

      • Have the authors looked into the potential role of lipids in regulating the conformational transition? Since the structure was obtained in nanodiscs, have they observed some unexplained densities? The role of lipid-protein interactions in regulating such transitions was observed for several transporters including MFS (Gupta K, et al. The role of interfacial lipids in stabilizing membrane protein oligomers. Nature. 2017 10.1038/nature20820. Martens C, et al. Direct protein-lipid interactions shape the conformational landscape of secondary transporters. Nat Commun. 2018 10.1038/s41467-018-06704-1.). Furthermore, I see the authors have already observed lipid specific functional regulation of MelB (ref: Hariharan, P., et al BMC Biol 16, 85 (2018). https://doi.org/10.1186/s12915-018-0553-0). A few words about this previous work, and even commenting on the absence of lipid-protein interactions in this current work is worthwhile.

      Our answer: Thanks for this very relevant comment. We paid attention to the unmodelled densities. There is one with potential but it is challenging to model it. We have added a sentence “There is no unexplained density that can be clearly modeled by lipids.” in the method to address this concern.

      Reviewer #3 (Recommendations For The Authors):

      1) In the following sentence, the authors report high errors for the Kd value. The anti-Fab Nb binding to NabFab was two-fold poorer than Nb725_4 at a Kd value of 0.11 {plus minus} 0.16 μM. The figure however indicates that the error value is 0.016 µM. Pls correct.

      Our answer: Thank you. You are correct. The error has been corrected. 0.16 ± 0.02 uM. In this revised manuscript, we present the data in nM units.

      2) Is the stoichiometry of the MelB:Na+ symport clearly known in this transporter. It can be mentioned in the discussion with appropriate references.

      Our answer: Yes, the stoichiometry of unity has been clearly determined, which was included in the second paragraph of the previous version.

      3) In the last section of results, the authors seem to suggest a greater movement within their Cterminal helical bundle compared to N-terminal helices. Is there evidence to suggest an asymmetry in the rocker switch between the two states of the transporter?

      Our answer: Our structural data revealed that the C-terminal bundle is more dynamic compared with the N-terminal bundle where hosts the residues for specific binding of galactoside and Na+. The HDX data showed that the most dynamic regions are the structurally unresolved C-terminal tail by either method, the conserved tail helix and the middle-loop helix. transmembrane helices are relatively less dynamic with similar distributions on both transmembrane bundles. Since the most dynamic regions are peripheral element associated with the C-terminal domain, it might give a wrong impression. With regard to the symmetric or asymmetric movement, which will certainly affect the dynamic interactions between the transporter and the lipids, we favor the notion that MelBSt performs symmetric movement during the rocker switch between inward and outward states at the least cost for the protein-lipids interaction.

      4) Figure 1. Are the thermograms exothermic or endothermic? clarify

      Our answer: In our thermograms, all positive peaks are exothermic due to the direct detection of the heat release by the TA instrument. We clarified this in Method and now we stress this in figure legends to avoid confusion.

      5) Figure 4a,d. Please put in a membrane bilayer and depict cytosolic and extracellular compartments for clarity.

      Thank you. We have added a bilayer and labeled the sidedness in this figure and other related figures.

      6) Fig 7. Melibiose symport cannot be referred to as Melibiose efflux transport in the legend as the latter refers to antiport. Pls rectify.

      Our answer: Influx and efflux are conventionally used to describe the direction of movement of a substrate. The use of symport and antiport indicates the directions of the coupling reaction for the cargo and cation. For the symporter MelB, melibiose efflux means that sugar with the coupled cation moves out, which is driven by the melibiose concentration. During the steady state of melibiose active transport, efflux rate = influx rate.

      7) Page 11 "A common feature of carrier transporters". The authors can use either carriers or transporters. Need not use both simultaneously.

      Sorry for overlooking this. We have deleted carriers. Thank you very much for your time.

      8) Several typos were noticed in this manuscript. some are listed below. pls correct.

      Page 4- last paragraph "Furthermore"

      We have corrected it. Thank you again!

      Page 7 - second para one repharse "affinity reduced by 21~32 fold/units.." pls clarify

      Added 21~32 fold.

      Page 9 - "so it is highly likely that inward-open conformation" pls correct.

      We have corrected to “likely”.

      Fig. S9c - correct the spelling "Distance".

      We have corrected to “Distance”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Major comments:

      1) The authors conclude that the bone growth defects are chondrocyte-specific, highlighting no changes in the IGF pathway. However, other bone cells such as mesenchymal progenitors, osteoblasts, osteocytes, and marrow stromal cells are also lateral plate mesoderm derived and likely have roles in the bone growth phenotypes (a). Additionally, while the size decrease of the proliferative zone was stated, no actual proliferation assays such as BrdU were conducted (b). With the elements being of such small size in the mutants, the defects are likely to be found at the earliest stages of limb development at E11.5-E13.5 and may be due to mesenchymal to chondrocyte transitions or defects in osteoblast lineage development (c). Overall, the skeletal characterization is not rigorous and does not identify even a likely cellular mechanism. Further, a molecular mechanism by which SMN functions in mesenchymal progenitors, chondrocytes, or osteoblast lineage cells has not been assessed (d).

      (a, c) As the reviewer commented, it seems to be a very important point to evaluate whether there is any problem in embryonic development from the time of mesenchymal cell condensation of the limb bud to the primary ossification center. However, when Hensel et al evaluated bone growth in P3 of severe SMA mice, the growth defect was not very large, with control femur length 3.5 mm and mutant 3.2 mm. it seems that even if SMN defects occur, there is no major problem with endochondral bone formation in the embryonic period (Hensel et al., 2020).

      In this study, the SMN2 1-copy mutant with the bone growth defect was found to have a similar reduction in SMN protein to the severe SMA mouse model in experiments quantifying SMN protein. When Hensel et al. performed an in vitro ossification test on primary osteoblasts from the other severe SMA mouse model (Taiwanese severe SMA), they found no significant difference compared to controls. In femurs at P3 from severe SMA mice, they found no difference in bone voxel density and bone thickness (Hensel et al., 2020). In our data, bone thickness was not different in Figure 1 and Figure 1 – figure supplement 2, and BMD was actually greater. Thus, we believe that osteoblast and osteocyte function does not appear to be impaired by the absence of SMNs. When we looked at cortical osteoblasts in our new Figure 1-figure supplement 2, there did not appear to be a significant difference in density.

      Furthermore, it is unlikely that BMSCs contributed to the bone growth we observed up to 2 weeks of age. the Lepr+Cxcl12+ BMSC population, which constitutes 94% ± 4% of CFU-F colonies formed by bone marrow cells (Zhou et al.k, 2014), is Prrx1-positive, and is known to be capable of osteogenesis in vivo, was only shown to differentiate into osteoblasts and form new bone in adults over 8 weeks of age. In the Lepr-cre; tdTomato; Col2.3-GFP mouse model, few cells expressing the osteoblast marker Col2.3-GFP are found before 2 months, and only about 3% of femur trabecular and cortical osteocytes express tdTomato at 2 months (Zhou et al., 2014). In Cxcl12-CreER; tdTomato; Col2.3-GFP mouse model, the researchers did not find tomato positivity in osteoblasts and osteocytes even after administration of tamoxifen at P3 and analysis 1 year later (Matsushita et al., 2020).

      We, therefore, concluded that the bone growth abnormalities observed in SMN2 1-copy mutants are due to problems in endochondral ossification caused by chondrocyte defects and not due to other Prrx1-lineage skeletal cells.

      (b) According to the reviewer's suggestion, we evaluated cell proliferation in the new Figure 1J-L by performing immunostaining for the Ki67 proliferation marker in growth plates.

      (d) As the reviewer pointed out, we enhanced the mechanism study and found the reduction of chondrocyte-derived IGF signaling and hypertrophic marker in new Figure 2. We evaluated the density of osteoblasts and osteoclasts, which can affect bone mineralization. We highlighted the limited impact of BMSCs on bone growth in the first two weeks of life. In a previous study, SMN-deleted osteoblasts did not show any issues with ossification (Hensel et al., 2020). In fact, osteoblast density in the SMN2 1-copy mutant was not different from the control, indicating that the skeletal abnormalities can largely be attributed to deficiencies in endochondral ossification caused by chondrocytes. Since chondrocytes are the local source of IGF and our mutants exhibit phenotypes similar to mouse models with reduced IGF, such as downregulated expression of Igf1 and Igfbp3, downregulated IGF-induced hypertrophic gene expression, reduced AKT phosphorylation, proliferation, and growth plate zone length, SMN-deleted chondrocytes probably showed these phenotypes due to decreased IGF secretion. Now, we added new Figure 2A-C, and E.

      2) Is the liver the only organ/tissue that supplied IGF to the chondrocytes or are other lateral plate mesoderm-derived cells potential suppliers? It's not possible to pin SMN deletion in chondrocytes as intrinsic ignoring the other bone cell types that it is depleted from in the Prrx1Cre genetic model.

      Recently, Oichi et al. reported that the local IGF source in the growth plate is chondrocytes by in situ hybridization and p-AKT staining (Oichi et al., 2023). When we measured IGF in chondrocytes isolated from articular cartilage, the expressions of Igf1 andIgfbp3 were markedly reduced in chondrocytes with SMN deletion compared to controls (New Figure 2E), suggesting that intrinsic SMN expression in chondrocytes plays an important role in the growth plate.

      3) Why is SMN protein being isolated from FAPs to assess levels in the null/SMN2 single copy/double copy mutants when the bone defects are supposed to be a chondrocyte-specific phenotype? This protein expression needs to be confirmed in chondrocytes themselves, and or other Prrx1Cre lineaged skeletal cells.

      According to the reviewer’s suggestion, we attempted to evaluate the protein levels in chondrocytes of the SMN2 1-copy mutant. However, we were unable to obtain sufficient numbers of chondrocytes, because of poor proliferation of mutant chondrocytes compared to controls in culture conditions. We could obtain ~10^4 viable cells from 1 mouse of SMN2 1-copy mutant. Therefore, our only options for confirming SMN deletion in chondrocytes were DNA and RNA work. As in the Prrx1-lineage FAPs that the amount of SMN protein correlates with the expression levels of full-length SMN mRNA (Figure 2H-J), we expect that the SMN protein in chondrocytes would be fully depleted due to poor full-length SMN mRNA expression (Figure 2H).

      4) Figure 2E should have example images of each type of NMJ characterization.

      We revised our figure by adding the example images in new Figure 3E.

      5) What are the overall NMJ numbers in the normal formation period? Are these constant into the juvenile period when the authors say the deterioration occurs?

      We appreciate the reviewer's constructive comments, and it would be interesting to see if we could see a difference in the total number of NMJs. However, there is one NMJ in every myofiber, and each muscle has hundreds to thousands of myofibers. The technical difficulty of confocal imaging an entire muscle, which can be several millimeters across, precludes experiments that count every NMJ and show a difference. It may be possible to do so by combining clearing and confocal line scanning techniques. In our analysis of the NMJ, the formation of the NMJ in the mutant appears to be normal. Additionally, the number of myofibers seems to be the same, and there may be no difference in the total NMJ number.

      6) For transplantation experiments the authors sorted YFP or TOMATO+ cells from the Prrx1Cre mice muscles, but refer to them as FAPs. It is known that other cells including tenocyte-like cells, pericytes, and vascular smooth muscle cells are identified by this reporter line. Staining for TOMATO colocalization with PDGFRA would help to clarify this.

      In the method ‘Hindlimb fibro-adipogenic progenitors isolation’ section, we sorted 7AAD–Lin–Vcam–Sca1+ population refers to FAPs. For FAPs transplantation, we also used YFP or TOMATO+ FAPs (7AAD–Lin–Vcam–Sca1+). The ‘FAPs transplantation’ method section did not specify the FAPs population in detail. This has been fixed in the new method. Sca1 (Ly6a) is an effective marker for identifying FAPs within Prrx1-lineage cells, as well as Pdgfra (Leinroth et al., 2022).

      7) The authors only compare the SMN2 single copy mutant transplantation to contralateral to show rescue, but how does this compare to overall wt morphology?

      According to the reviewer’s constructive comment, we compared them with wild-type morphology (new Figure 7A-D).

      8) The asterisks of TOMATO+ in Figure 6A are confusing. FAPs do not usually clump together to form such large plaques and are normally much thinner tendrils. What is the reason for this?

      As the reviewer states, FAPs have a fibroblast-like morphology with elongated thinner tendrils. The Figure 6A image in the figure shows a Z-sliced cell body portion of FAP, where the nucleus is located, and it appears blunt. We attached imaged tomato+ FAPs, in which their cell body parts are plaque-like.

      Author response image 1.

      Tomato+ FAPs in muscle

      9) Would transplantation of healthy FAPs after NMJ maturation in SMN mutants still rescue the phenotype? Assessment of this is key for therapy intervention timelines moving forward.

      It will be very interesting to see if the phenotype improves after NMJ maturation by healthy FAPs transplantation, but this is a technically difficult experiment to do because we found that FAPs do not implant effectively when injected into naive adult muscle. The transplantation into the adult is sufficiently possible if accompanied by an injury, but this eventually leads to new formation of NMJ again. Thus, it seems impossible to do transplantation experiment after NMJ maturation through general methods. If we discover a method to efficiently rescue SMNs from FAPs or identify a factor that affects FAPs' influence on NMJ, then we may be able to conduct this experiment.

      Reference

      Hensel, N., Brickwedde, H., Tsaknakis, K., Grages, A., Braunschweig, L., Lüders, K. A., Lorenz, H. M., Lippross, S., Walter, L. M., Tavassol, F., Lienenklaus, S., Neunaber, C., Claus, P., & Hell, A. K. (2020). Altered bone development with impaired cartilage formation precedes neuromuscular symptoms in spinal muscular atrophy. Human Molecular Genetics, 29(16), 2662–2673. https://doi.org/10.1093/hmg/ddaa145

      Leinroth, A. P., Mirando, A. J., Rouse, D., Kobayahsi, Y., Tata, P. R., Rueckert, H. E., Liao, Y., Long, J. T., Chakkalakal, J. V., & Hilton, M. J. (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports, 39(6), 110785. https://doi.org/10.1016/j.celrep.2022.110785

      Matsushita, Y., Nagata, M., Kozloff, K. M., Welch, J. D., Mizuhashi, K., Tokavanich, N., Hallett, S. A., Link, D. C., Nagasawa, T., Ono, W., & Ono, N. (2020). A Wnt-mediated transformation of the bone marrow stromal cell identity orchestrates skeletal regeneration. Nature Communications, 11(1). https://doi.org/10.1038/s41467-019-14029-w

      Oichi, T., Kodama, J., Wilson, K., Tian, H., Imamura Kawasawa, Y., Usami, Y., Oshima, Y., Saito, T., Tanaka, S., Iwamoto, M., Otsuru, S., & Enomoto-Iwamoto, M. (2023). Nutrient-regulated dynamics of chondroprogenitors in the postnatal murine growth plate. Bone Research, 11(1). https://doi.org/10.1038/s41413-023-00258-9

      Zhou, B. O., Yue, R., Murphy, M. M., Peyer, J. G., & Morrison, S. J. (2014). Leptin-receptor-expressing mesenchymal stromal cells represent the main source of bone formed by adult bone marrow. Cell Stem Cell, 15(2), 154–168. https://doi.org/10.1016/j.stem.2014.06.008

      Reviewer #2

      Major comments:

      1) Regarding bone deficits - CT analysis of bones should be more comprehensive than Figure 1A shows. How about cross-sections? (a) Are bone phenotypes also age-dependent? (b) PCR was done only for SMA and related proteins (such as IGF). IGF protein in the blood and relevant organs should be studied. Why not include biomarkers of osteoblasts or/and osteoclasts and their regulators? (c)

      (a) We appreciate the reviewer’s constructive comment. we added longitudinal section views in new Figure 1A and a description of trabecular bone volume and secondary ossification center in the main text.

      (b) Age-dependent evaluation is an important point. By adulthood, the difference between the SMN2 1-copy mutant and the control is much larger, and even at birth there is a slight difference, although not as large as at 2 weeks of age. We focused our phenotyping on bone growth at 2 weeks of age, a time when new bone formation by BMSCs is less influential, when bone growth is primarily driven by endochondral ossification of chondrocytes, and before the defect in the NMJ is primarily manifested.

      (c) As the reviewer comments, it is important that IGF are evaluated in tissues other than liver. However, the liver is most likely the source of systemic IGF, as shown by the liver-specific deletion of Igf1 and knockout of Igfals, a protein that forms the IGF ternary complex, which is predominantly expressed in the liver. This resulted in a 90% drop in serum IGF levels and a phenotype of shortened femur length and growth plates in the double KO mice (Yakar et al., 2002).

      The local IGF source in the growth plate is chondrocytes confirmed by Igf1 in situ hybridization and p-AKT staining (Oichi et al., 2023). From the In situ hybridization data, we can observe that bone marrow and bone do not express Igf1 at all, but only perichondrium and chondrocytes in the resting zone express Igf1 mRNA. Therefore, we can see that the only supplier of IGF among LPM-derived cells is chondrocytes, and in the new figure 2, we measured IGF pathway expression and AKT phosphorylation in chondrocytes. We have confirmed that the expression of Igf1/Igfbp3 is reduced in chondrocytes with SMN deletion.

      To assess serum IGF level, we could not set up this experiment condition during our revision period due to the requirement of administrative procedures for purchasing new apparatuses and the limitation of our research funds. However, as previously stated, there is no difference in the expression of Igf1 and Igfals in the liver, which accounts for 90% of serum IGF levels. Therefore, we did not anticipate significant variations in serum IGF levels.

      Evaluation of osteoblasts or osteoclasts was done by section staining due to sampling difficulties for PCR. we assessed osteoblasts and osteoclasts state in new Figure 1-figure supplement 2.

      2) What is the relationship between deficits of bone deficits and muscle deficits or even NMJ deficits? Are they inter-related? Is skeletal muscle development also defective in Smn∆MPC mice? Can NMJ deficits result from bone deficits? Or vice versa?

      Unfortunately, the reviewer's comments are very difficult to clarify in our study using the Prrx1-cre model. In skeletal muscle development, the myofiber number was not significantly different in our mouse models. A study has shown that inactivating noggin, a BMP antagonist expressed in condensed cartilage and immature chondrocytes, results in severe skeletal defects without affecting the early stages of muscle differentiation (Tylzanowski et al., 2006). Therefore, bone may not have a significant impact on the early development of muscle, but later in postnatal development it may have an impact on motor performance issues. The relationship between bone and NMJ hasn't been studied. The impact of bone defects on motor skill may result in muscle weakness and NMJ problems. In our study, we showed that NMJ deficit rescue by transplantation of FAPs and decreased IGF in chondrocytes, a key source of local IGF. This suggests that the functions of FAPs in NMJ and chondrocytes in bone deficit are crucial, rather than each other's influence.

      3) Regarding the rescue experiment, the interpretation of the data should be careful. Evidently, healthy FAPs (td-Tomato positive) were transplanted into TA muscles of 10 days-old SMN2 1-copy SmnΔMPC mice, and NMJs were looked at P56. The control was contralateral TA that was injected with the vehicle. As described above, the data had huge SEM and were difficult to interpret or believe. The control perhaps was wrong if FAPs act by releasing "chemicals" because FAPs from one leg may go to other muscles via blood. Second, if FAPs act via contact, the data shown did not support this. Two red FAPs were shown in Figure 6, one of which was superimposed with a nerve track to one of the three NMJs. This NMJ however did not show any difference to the other two, which did not support a contact mechanism. These rescue data were not convincing.

      We appreciate the reviewer’s critical comment, but the reviewer appears to have confused the minimum and maximum range bars in the box-and-whisker plot with the SEM error bar in the bar graph. We apologize for the insufficient description of the figure legends section. We revised them. New Figure 7C, which is a bar graph, has a sufficiently short SEM error bar. In contrast, box-and-whisker plots B and D depict the minimum and maximum range, instead of the SEM, and they are significantly different with a p-value of less than 0.001. If FAPs affect the NMJ via a paracrine factor or ECM with a short range of action, they may rescue the NMJ defect in a non-contact-dependent manner, without affecting the contralateral muscle. Also, the FAPs are heterogeneous, so if only a certain subpopulation rescues, the tomato+ FAP in the figure may not be the rescuing cells.

      4) For most experiments, the "n" numbers were too small. 3-5 mice were used for bone characterization. For the NMJ, most experiments were done with 3 mice. It was unclear how many NMJs were looked at. Perhaps due to small n numbers, the SEM values were enormous (for example, in Figure 6).

      As with the response to the previous comment, this is due to confusion between box-and-whisker plots and bar graphs, and our data was determined to be significant using the appropriate statistical method.

      5) Also for experimental design, some experiments included four genotypes of mice (Fig. 1 J,K) whereas some had only three (Fig.1 A, B, C, D and Fig.3) and others had two (many other figures).

      In the first experiments to confirm the phenotypes, we tested the 2-copy mutant, but it was not significantly different from the wild type, and in subsequent experiments, we mainly tested the only 1-copy mutant.

      6) What was the reason why mixed muscles were used for NMJ characterization (TA versus EDL)? Why not pick a type I-fiber muscle and a type II-fiber muscle?

      We appreciate the constructive comment from the reviewer. Firstly, we conducted a phenotype analysis on the TA muscle. For electrophysiological recording, the EDL muscle should be used for intact nerve with muscle preparation, technically. Additionally, for TEM imaging, EDL was a suitable muscle to locate NMJ positions before TEM processing. Both TA and EDL muscles are adjacent and have similar fiber-type compositions. It would be important to observe in different fiber types of muscles, but when we first identified the phenotype, various types of limb muscles showed similar defects, so we focused on specific muscles.

      7) The description of mouse strains was confusing. SMN2 transgenic mice (with different copies) were not described in the methods.

      We apologize for the insufficient description of the method section. By crossing mice with the SMN2+/+ homologous allele, SMN2 heterologous mice with only one SMN2 allele are SMN2 1-copy mice (SMN2+/0) and SMN2 homologous mice are SMN2 2-copy mice (SMN2+/+). We revised our manuscript method ‘Animals’ section.

      Reference Oichi, T., Kodama, J., Wilson, K., Tian, H., Imamura Kawasawa, Y., Usami, Y., Oshima, Y., Saito, T., Tanaka, S., Iwamoto, M., Otsuru, S., & Enomoto-Iwamoto, M. (2023). Nutrient-regulated dynamics of chondroprogenitors in the postnatal murine growth plate. Bone Research, 11(1). https://doi.org/10.1038/s41413-023-00258-9

      Tylzanowski, P., Mebis, L., and Luyten, F. P. (2006). The noggin null mouse phenotype is strain dependent and haploinsufficiency leads to skeletal defects. Dev. Dyn. 235, 1599–1607. doi: 10.1002/dvdy.20782

      Yakar, S., Rosen, C. J., Beamer, W. G., Ackert-Bicknell, C. L., Wu, Y., Liu, J. L., Ooi, G. T., Setser, J., Frystyk, J., Boisclair, Y. R., & LeRoith, D. (2002). Circulating levels of IGF-1 directly regulate bone growth and density. Journal of Clinical Investigation, 110(6), 771–781. https://doi.org/10.1172/JCI0215463

      Reviewer #3

      1) The authors used Prrx1Cre mouse with floxed Smn exon7(Smnf7) mouse carrying multiple (one or two) copies of the human SMN2 gene. Is it expressed both in chondrocytes and mesenchymal progenitors in the limb?

      We appreciate the reviewer's comment. We analyzed the deletion of Smn in chondrocytes and FAPs via Cre using genomic PCR and qRT-PCR, as depicted in new Figure 2. The SMN2 allele, which is expressed throughout the body, can rescue Smn knockout mouse lethality (Monani et al., 2000). Indeed, the short limb length and lethality observed in SMN2 0-copy mutants were mitigated by the presence of multiple copies of SMN2. Therefore, both Chondrocytes and FAPs may express SMN2 transcripts from the transgenic SMN2 allele.

      2) Page 10 regarding Fig.2E, please show pretzel-like structure. In Figure 2E, plaque, perforated, open, and branched are shown; however, the pretzel is not shown. The same issue is for the Fig. 3D explanation in the text on page 12.

      We appreciate the reviewer's constructive feedback. We included illustrative figures of all types of NMJ characterization, and the branched type is identical to the pretzel type. Therefore, we have replaced ‘branched’ with ‘pretzel’ in our text and revised Figure 3E by incorporating the example images.

      3) The explanation of the electrophysiology for Fig.4 in the text on pages 12 and 15 (RRP) is not so convincing for the readers. It is advisable to add TEM data for transplantation if it is not technically difficult.

      We appreciate the reviewer's critical feedback. Because we did not measure RRP directly, we removed speculation about the possibility of RRP difference. If observing the active zone with TEM and the docking synaptic vesicle would help quantify RRP, it is technically difficult to obtain images of sufficient quality to distinguish the active zones with our current TEM imaging technique.

      4) The authors used the word FAP for 7AAD(-)Lin(-)Vcam(-)Sca1(+). It is recommended to show the expression of PDGFR alpha. Furthermore, as the authors stated in the text, mesenchymal progenitors (FAPs) are heterogeneous. Please discuss this point further. Other reports show at least 6 subpopulations using single-cell analyses (Cell Rep. 2022).

      In the report, Ly6a (Sca1) is a good marker for FAPs, as well as Pdgfra (Leinroth et al., 2022). The 6 subpopulations expressed Ly6a. The one of subpopulations associated with NMJ was discovered. This population expressed Hsd11b1, Gfra1, and Ret and is located adjacent to the NMJ and responds to denervation, indicating an increased possibility of interaction with the NMJ organization. In further our study, we aim to determine which subpopulations are crucial for NMJ maturation by transplanting them to mutants for rescue.

      5) How do authors determine the number of FAP cells for transplantation?

      The FAPs transplantation was performed according to a previously reported our study (Kim et al., 2021).

      Reference Kim, J. H., Kang, J. S., Yoo, K., Jeong, J., Park, I., Park, J. H., Rhee, J., Jeon, S., Jo, Y. W., Hann, S. H., Seo, M., Moon, S., Um, S. J., Seong, R. H., & Kong, Y. Y. (2022). Bap1/SMN axis in Dpp4+ skeletal muscle mesenchymal cells regulates the neuromuscular system. JCI Insight, 7(10). https://doi.org/10.1172/jci.insight.158380

      Leinroth, A. P., Mirando, A. J., Rouse, D., Kobayahsi, Y., Tata, P. R., Rueckert, H. E., Liao, Y., Long, J. T., Chakkalakal, J. V., & Hilton, M. J. (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports, 39(6), 110785. https://doi.org/10.1016/j.celrep.2022.110785

      Monani, U. R., Sendtner, M., Coovert, D. D., Parsons, D. W., Andreassi, C., Le, T. T., Jablonka, S., Schrank, B., Rossol, W., Prior, T. W., Morris, G. E., & Burghes, A. H. M. (2000). The human centromeric survival motor neuron gene (SMN2) rescues embryonic lethality in Smn(-/-) mice and results in a mouse with spinal muscular atrophy. Human Molecular Genetics, 9(3), 333–339. https://doi.org/10.1093/hmg/9.3.333

    1. Author Response

      eLife assessment

      In this valuable study, the authors investigate the transcriptional landscape of tuberculous meningitis, revealing key molecular differences contributed by HIV co-infection. Whilst some of the evidence presented is compelling, the bioinformatics analysis is limited to a descriptive narrative of gene-level functional annotations, which are somewhat basic and fail to define aspects of biology very precisely. Whilst the work will be of broad interest to the infectious disease community, validation of the data is critical for future utility.

      Response: We appreciate eLife’s positive assessment, although we challenge the conclusion that we ‘fail to define aspects of biology very precisely’. Our stated objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis and the eLife assessment affirms we have investigated ‘the transcriptional landscape of tuberculous meningitis’. To more precisely define aspects of the biology will require another study with different design and methods. Therefore the criticism seems unnecessarily harsh given the limitations of our stated objective.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tuberculous meningitis (TBM) is one of the most severe forms of extrapulmonary TB. TBM is especially prevalent in people who are immunocompromised (e.g. HIV-positive). Delays in diagnosis and treatment could lead to severe disease or mortality. In this study, the authors performed the largest-ever host whole blood transcriptomics analysis on a cohort of 606 Vietnamese participants. The results indicated that TBM mortality is associated with increased neutrophil activation and decreased T and B cell activation pathways. Furthermore, increased angiogenesis was also observed in HIV-positive patients who died from TBM, whereas activated TNF signaling and down-regulated extracellular matrix organisation were seen in the HIV-negative group. Despite similarities in transcriptional profiles between PTB and TBM compared to healthy controls, inflammatory genes were more active in HIV-positive TBM. Finally, 4 hub genes (MCEMP1, NELL2, ZNF354C, and CD4) were identified as strong predictors of death from TBM.

      Strengths:

      This is a really impressive piece of work, both in terms of the size of the cohort which took years of effort to recruit, sample, and analyse, and also the meticulous bioinformatics performed. The biggest advantage of obtaining a whole blood signature is that it allows an easier translational development into a test that can be used in the clinical with a minimally invasive sample. Furthermore, the data from this study has also revealed important insights into the mechanisms associated with mortality and the differences in pathogenesis between HIV-positive and HIV-negative patients, which would have diagnostic and therapeutic implications.

      Weaknesses:

      The data on blood neutrophil count is really intriguing and seems to provide a very powerful yet easy-to-measure method to differentiate survival vs. death in TBM patients. It would be quite useful in this case to perform predictive analysis to see if neutrophil count alone, or in combination with gene signature, can predict (or better predict) mortality, as it would be far easier for clinical implementation than the RNA-based method. Moreover, genes associated with increased neutrophil activation and decreased T cell activation both have significantly higher enrichment scores in TBM (Figure 9) and in morality (Figure 8). While I understand the basis of selecting hub genes in the significant modules, they often do not represent these biological pathways (at least not directly associated in most cases). If genes were selected based on these biologically relevant pathways, would they have better predictive values?

      Response: Blood neutrophil count was not found to be a predictor for TBM mortality in our previous studies. We agree it could be useful to perform predictive analysis with neutrophil count as suggested by reviewer. Regarding hub genes versus genes representative of the biological pathways, we cannot know which have better predictive values without performing variable selection for the sets of all genes including both hub genes and pathway representative genes, additional analysis which we will undertake.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the analysis of blood transcriptomic data from patients with TB meningitis, with and without HIV infection, with some comparison to those of patients with pulmonary tuberculosis and healthy volunteers. The objectives were to describe the comparative biological differences represented by the blood transcriptome in TBM associated with HIV co-infection or survival/mortality outcomes and to identify a blood transcriptional signature to predict these outcomes. The authors report an association between mortality and increased levels of acute inflammation and neutrophil activation, but decreased levels of adaptive immunity and T/B cell activation. They propose a 4-gene prognostic signature to predict mortality.

      Strengths:

      -Biological evaluations of blood transcriptomes in TB meningitis and their relationship to outcomes have not been extensively reported previously.

      -The size of the data set is a major strength and is likely to be used extensively for secondary analyses in this field of research.

      Weaknesses:

      The bioinformatic analysis is limited to a descriptive narrative of gene-level functional annotations curated in GO and KEGG databases. This analysis can not be used to make causal inferences. In addition, the functional annotations are limited to 'high-level' terms that fail to define biology very precisely. At best, they require independent validation for a given context. As a result, the conclusions are not adequately substantiated. The identification of a prognostic blood transcriptomic signature uses an unusual discovery approach that leverages weighted gene network analysis that underpins the bioinformatic analyses. However, the main problem is that authors seem to use all the data for discovery and do not undertake any true external validation of their gene signature. As a result, the proposed gene signature is likely to be overfitted to these data and not generalisable. Even this does not achieve significantly better prognostic discrimination than the existing clinical scoring.

      Response: As explained in response to the eLife assessment, our objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis. We agree that ‘This analysis can not be used to make causal inferences’: that would require different study design and approaches. The proposed gene signature has higher AUC values than the existing clinical model. We agree that validation of the gene signature in an independent sample set will be a crucial next step.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      Concerns Public Review:

      1)The framing of 'infinite possible types of conflict' feels like a strawman. While they might be true across stimuli (which may motivate a feature-based account of control), the authors explore the interpolation between two stimuli. Instead, this work provides confirmatory evidence that task difficulty is represented parametrically (e.g., consistent with literatures like n-back, multiple object tracking, and random dot motion). This parametric encoding is standard in feature-based attention, and it's not clear what the cognitive map framing is contributing.

      Suggestion:

      1) 'infinite combinations'. I'm frankly confused by the authors response. I don't feel like the framing has changed very much, besides a few minor replacements. Previous work in MSIT (e.g., by the author Zhongzheng Fu) has looked at whether conflict levels are represented similarly across conflict types using multivariate analyses. In the paper mentioned by Ritz & Shenhav (2023), the authors looked at whether conflict levels are represented similarly across conflict types using multivariate analyses. It's not clear what this paper contributes theoretically beyond the connections to cognitive maps, which feel like an interpretative framework rather than a testable hypothesis (i.e., these previous paper could have framed their work as cognitive maps).

      Response: We acknowledge the limitations inherent in our experimental design, which prevents us from conducting a strict test of the cognitive space view. In our previous revision, we took steps to soften our conclusions and emphasize these limitations. However, we still believe that our study offers valuable and novel insights into the cognitive space, and the tests we conducted are not merely strawman arguments.

      Specifically, our study aimed to investigate the fundamental principles of the cognitive space view, as we stated in our manuscript that “the representations of different abstract information are organized continuously and the representational geometry in the cognitive space is determined by the similarity among the represented information (Bellmund et al., 2018)”. While previous research has applied multivariate analyses to understand cognitive control representation, no prior studies had directedly tested the two key hypotheses associated with cognitive space: (1) that cognitive control representation across conflict types is continuous, and (2) that the similarity among representations of different conflict types is determined by their external similarity.

      Our study makes a unique contribute by directly testing these properties through a parametric manipulation of different conflict types. This approach differs significantly from previous studies in two ways. First, our parametric manipulation involves more than two levels of conflict similarity, enabling us to directly test the two critical hypotheses mentioned above. Unlike studies such as Fu et al. (2022) and other that have treated different conflict types categorically, we introduced a gradient change in conflict similarity. This differentiation allowed us to employ representational similarity analysis (RSA) over the conflict similarity, which goes beyond mere decoding as utilized in prior work (see more explanation below for the difference between Fu et al., 2022 and our study [1]).

      Second, our parametric manipulation of conflict types differs from previous studies that have manipulated task difficulty, and the modulation of multivariate pattern similarity observed in our study could not be attributed by task difficulty. Previous research, including the Ritz & Shenhav (2023) (see below explanation[2]), has primarily shown that task difficulty modulates univoxel brain activation. A recent work by Wen & Egner (2023) reported a gradual change in the multivariate pattern of brain activations across a wide range of frontoparietal areas, supporting the reviewer’s idea that “task difficulty is represented parametrically”. However, we do not believe that our results reflect the task difficulty representation. For instance, in our study, the spatial Stroop-only and Simon-only conditions exhibited similar levels of difficulty, as indicated by their relatively comparable congruency effects (Fig. S1). Despite this similarity in difficulty, we found that the representational similarity between these two conditions was the lowest (see revised Fig. S4, the most off-diagonal value). This observation aligns more closely with our hypothesis that these two conditions are most dissimilar in terms of their conflict types.

      [1] Fu et al. (2022) offers important insights into the geometry of cognitive space for conflict processing. They demonstrated that Simon and flanker conflicts could be distinguished by a decoder that leverages the representational geometry within a multidimensional space. However, their model of cognitive space primarily relies on categorical definitions of conflict types (i.e., Simon versus flanker), rather than exploring a parametric manipulation of these conflict types. The categorical manipulations make it difficult to quantify conceptual similarity between conflict types and hence limit the ability to test whether neural representations of conflict capture conceptual similarity. To the best of our knowledge, no previous studies have manipulated the conflict types parametrically. This gap highlights a broader challenge within cognitive science: effectively manipulating and measuring similarity levels for conflicts, as well as other high-level cognitive processes, which are inherently abstract. We therefore believe our parametric manipulation of conflict types, despite its inevitable limitations, is an important contribution to the literature.

      We have incorporated the above statements into our revised manuscript: Methodological implications. Previous studies with mixed conflicts have applied mainly categorical manipulations of conflict types, such as the multi-source interference task (Fu et al., 2022) and color Stroop-Simon task (Liu et al., 2010). The categorical manipulations make it difficult to quantify conceptual similarity between conflict types and hence limit the ability to test whether neural representations of conflict capture conceptual similarity. To the best of our knowledge, no previous studies have manipulated the conflict types parametrically. This gap highlights a broader challenge within cognitive science: effectively manipulating and measuring similarity levels for conflicts, as well as other high-level cognitive processes, which are inherently abstract. The use of an experimental paradigm that permits parametric manipulation of conflict similarity provides a way to systematically investigate the organization of cognitive control, as well as its influence on adaptive behaviors.

      [2] The work by Ritz & Shenhav (2023) indeed applied multivariate analyses, but they did not test the representational similarity across different levels of task difficulty in a similar way as our investigation into different levels of conflict types, neither did they manipulated conflict types as our study. They first estimated univariate brain activations that were parametrically scaled by task difficulty (e.g., target coherence), yielding one map of parameter estimates (i.e., encoding subspace) for each of the target coherence and distractor congruence. The multivoxel patterns from the above maps were correlated to test whether the target coherence and distractor congruence share the similar neural encoding. It is noteworthy that the encoding of task difficulty in their study is estimated at the univariate level, like the univariate parametric modulation analysis in our study. The representational similarity across target coherence and distractor congruence was the second-order test and did not reflect the similarity across different difficulty levels. Though, we have found another study (Wen & Egner, 2023) that has directly tested the representational similarity across different levels of task difficulty, and they observed a higher representational similarity between conditions with similar difficulty levels within a wide range of brain regions.

      Reference:

      Wen, T., & Egner, T. (2023). Context-independent scaling of neural responses to task difficulty in the multiple-demand network. Cerebral Cortex, 33(10), 6013-6027. https://doi.org/10.1093/cercor/bhac479

      Fu, Z., Beam, D., Chung, J. M., Reed, C. M., Mamelak, A. N., Adolphs, R., & Rutishauser, U. (2022). The geometry of domain-general performance monitoring in the human medial frontal cortex. Science (New York, N.Y.), 376(6593), eabm9922. https://doi.org/10.1126/science.abm9922

      Ritz, H., & Shenhav, A. (2023). Orthogonal neural encoding of targets and distractors supports multivariate cognitive control. https://doi.org/10.1101/2022.12.01.518771 Another issue is suggesting mixtures between two types of conflict may be many independent sources of conflict. Again, this feels like the strawman. There's a difference between infinite combinations of stimuli on the one hand, and levels of feature on the other hand. The issue of infinite stimuli is why people have proposed feature-based accounts, which are often parametric, eg color, size, orientation, spatial frequency. Mixing two forms of conflict is interesting, but the task limitations (i.e., highly correlated features) prevent an analysis of whether these are truly mixed (or eg reflect variations on just one of the conflict types). Without being able to compare a mixture between types vs levels of only one type, it's not clear what you can draw from these results re: how these are combined (and not clear how it reconciles the debate between general and specific).

      Response: As the reviewer pointed out, a feature (or a parameterization) is an efficient way to encode potentially infinite stimuli. This is the same idea as our hypothesis: different conflict types are represented in a cognitive space akin to concrete features such as a color spectrum. This concept can be illustrated in the figure below.

      Author response image 1.

      We would like to clarify that in our study we have manipulated five levels of conflict types, but they all originated from two fundamental sources: vertically spatial Stroop and horizontally Simon conflicts. We agree that the mixture of these two sources does not inherently generate additional conflict sources. However, this mixture does influence the similarity among different conflict conditions, which provides essential variability that is crucial for testing the core hypotheses (i.e., continuity and similarity modulation, see the response above) of the cognitive space view. This clarification is crucial as the reviewer’s impression might have been influenced by our introduction, where we repeatedly emphasized multiple sources of conflicts. Our aim in the introduction was to outline a broader conceptual framework, which might not directly reflect the specific design of our current study. Recognizing the possibility of misinterpretation, we have adjusted our introduction and discussion to place less emphasis on the variety of possible conflict sources. For example, we have removed the expression “The large variety of conflict sources implies that there may be innumerable number of conflict conditions” from the introduction. As we have addressed in the previous response, the observed conflict similarity effect could not be attributed to merely task difficulty. Similarly, the mixture of spatial Stroop and Simon conflicts should not be attributed to one conflict source only; doing so would oversimplify it to an issue of task difficulty, as it would imply that our manipulation of conflict types merely represented varying levels of a single conflict, akin to manipulating task difficulty when everything else being equal. Importantly, the mixed conditions differ from variations along a single conflict source in that they also incorporate components of the other conflict source, thereby introducing difference beyond that would be found within variances of a single conflict source. There are a few additional evidence challenging the single dimension assumption. In our previous revisions, we compared model fittings between the Cognitive-Space model and the Stroop-/Simon-only models, and results showed that the CognitiveSpace model (BIC = 5377093) outperformed the Stroop-Only (BIC = 5377122) and Simon-Only (BIC = 5377096) models. This suggests that mixed conflicts might not be solely reflective of either Stroop or Simon sources, although we did not include these results due to concerns raised by reviewers about the validity of such comparisons, given the high anticorrelation between the two dimensions. Furthermore, Fu et al. (2022) demonstrated that the mixture of Simon and Flanker conflicts (the sf condition) is represented as the vector sum of the Flanker and Simon dimensions within their space model, indicating a compositional nature. Similarly, our mixed conditions are combinations of Stroop and Simon conflicts, and it is plausible that these mixtures represent a fusion of both Stroop and Simon components, rather than just one. Thus, we disagree that the mixture of conflicts is a strawman. In response to this concern, we have included a statement in our limitation section: “Another limitation is that in our design, the spatial Stroop and Simon effects are highly anticorrelated. This constraint may make the five conflict types represented in a unidimensional space (e.g., a circle) embedded in a 2D space. This limitation also means we cannot conclusively rule out the possibility of a real unidimensional space driven solely by spatial Stroop or Simon conflicts. However, this appears unlikely, as it would imply that our manipulation of conflict types merely represented varying levels of a single conflict, akin to manipulating task difficulty when everything else being equal. If task difficulty were the primary variable, we would expect to see greater representational similarity between task conditions of similar difficulty, such as the Stroop and Simon conditions, which demonstrates comparable congruency effects (see Fig. S1). Contrary to this, our findings reveal that the Stroop-only and Simon-only conditions exhibit the lowest representational similarity (Fig. S4). Furthermore, Fu et al. (2022) has shown that the representation of mixtures of Simon and Flanker conflicts was compositional, rather than reflecting single dimension, which also applies to our cases.”

      My recommendation would be to dramatically rewrite to reduce the framing of this providing critical evidence in favor of cognitive maps, and being more overt about the limitations of this task. However, the authors are not required to make further revisions in eLife's new model, and it's not clear how my scores would change if they made those revisions (ie the conceptual limitations would remain, the claims would just now match the more limited scope).

      Response: With the above rationales and the adjustments we have made in the manuscripts, we believe that we have thoroughly acknowledged and articulated the limitations of our study. Therefore, we have decided against a complete rewrite of the manuscript.

      Public Review:

      2) The representations within DLPFC appear to treat 100% Stoop and (to a lesser extent) 100% Simon differently than mixed trials. Within mixed trials, the RDM within this region don't strongly match the predictions of the conflict similarity model. It appears that there may be a more complex relationship encoded in this region.

      Suggestion:

      2) RSMs in the key region of interest. I don't really understand the authors response here either. e.g,. 'It is essential to clarify that our conclusions were based on the significant similarity modulation effect identified in our statistical analysis using the cosine similarity model, where we did not distinguish between the within-Stroop condition and the other four within-conflict conditions (Fig. 7A, now Fig. 8A). This means that the representation of conflict type was not biased by the seemingly disparities in the values shown here'. In Figure 1C, it does look like they are testing this model.

      It seems like a stronger validation would test just the mixture trials (i.e., ignoring Simon-only and stroop-only). However, simon/stroop-only conditions being qualitatively different does beg the question of whether these are being represented parametrically vs categorically.

      Response: We apologize for the confusion caused by our previous response. To clarify, our conclusions have been drawn based on the robust conflict similarity effect.

      The conflict similarity regressor is defined by higher values in the diagonal cells (representing within-conflict similarity) than the off-diagonal cells (indicating between-conflict similarity), as illustrated in Fig. 1C and Fig. 8A (now Fig. 4B). It is important to note that this regressor may not be particularly sensitive to the variations within the diagonal cells. Our previous response aimed to emphasize that the inconsistencies observed along the diagonal do not contradict our core hypothesis regarding the conflict similarity effect.

      We recognized that since the visualization in Fig. S4, based on the raw RSM (i.e., Pearson correlation), may have been influenced by other regressors in our model than the conflict similarity effect. To reflect pattern similarity with confounding factors controlled for, we have visualized the RSM by including only the fixed effect of the conflict similarity and the residual while excluding all other factors. As shown in the revised Figure S4, the difference between the within-Stroop and other diagonal cells was greatly reduced. Instead, it revealed a clear pattern where that the diagonal values were higher than the off-diagonal values in the incongruent condition, aligning with our hypothesis regarding the conflict similarity modulator. Although some visual distinctions persist within the five diagonal cells (e.g., in the incongruent condition, the Stroop, Simon, and StMSmM conditions appear slightly lower than StHSmL and StLSmM conditions), follow-up one-way ANOVAs among these five diagonal conditions showed no significant differences. This held true for both incongruent and congruent conditions, with Fs < 1. Thus, we conclude that there is no strong evidence supporting the notion that Simon- and spatial Stroop-only conditions are systematically different from other conflict types. As a result, we decided not to exclude these two conflict types from analysis.

      Author response image 2.

      The stronger conflict type similarity effect in incongruent versus congruent conditions. Shown are the summary representational similarity matrices for the right 8C region in incongruent (left) and congruent (right) conditions, respectively. Each cell represents the averaged Pearson correlation (after regressing out all factors except the conflict similarity) of cells with the same conflict type and congruency in the 1400×1400 matrix. Note that the seemingly disparities in the values of withinconflict cells (i.e., the diagonal) did not reach significance for either incongruent or congruent trials, Fs < 1.

      Public Review:

      3) To orthogonalized their variables, the authors need to employ a complex linear mixed effects analysis, with a potential influence of implementation details (e.g., high-level interactions and inflated degrees of freedom).

      Suggestion:

      3) The DF for a mixed model should not be the number of observations minus the number of fixed effects. The gold standard is to use satterthwaite correction (e.g. in Matlab, fixedEffects(lme,'DFMethod','satterthwaite')), or number of subjects - number of fixed effects (i.e. you want to generalize to new subjects, not just new samples from the same subjects). Honestly, running a 4-way interaction probably is probably using more degrees of freedom than are appropriate given the number of subjects.

      Response: We concur with the reviewer’s comment that our previous estimation of degrees of freedom (DFs) was inaccurate. Following your suggestion, we have now applied the “Satterthwaite” approach to approximate the DFs for all our linear mixed effect model analyses. This adjustment has led to the correction of both DFs and p values. In the Methods section, we have mentioned this revision.

      “We adjusted the t and p values with the degrees of freedom calculated through the Satterthwaite approximation method (Satterthwaite, 1946). Of note, this approach was applied to all the mixed-effect model analyses in this study.”

      The application of this method has indeed resulted in a reduction of our statistical significance. However, our overall conclusions remained robust. Instead of the highly stringent threshold used in our previous version (Bonferonni corrected p < .0001), we have now adopted a relatively more lenient threshold of Bonferonni correction at p < 0.05, which is commonly employed in the literature. Furthermore, it is worth noting that the follow-up criteria 2 and 3 are inherently second-order analyses. Criterion 2 involves examining the interaction effect (conflict similarity effect difference between incongruent and congruent conditions), and criterion 3 involves individual correlation analyses. Due to their second-order nature, these criteria inherently have lower statistical power compared to criterion 1 (Blake & Gangestad, 2020). We thus have applied a more lenient but still typically acceptable false discovery rate (FDR) correction to criteria 2 and 3. This adjustment helps maintain the rigor of our analysis while considering the inherent differences in statistical power across the various criteria. We have mentioned this revision in our manuscript:

      “We next tested whether these regions were related to cognitive control by comparing the strength of conflict similarity effect between incongruent and congruent conditions (criterion 2) and correlating the strength to behavioral similarity modulation effect (criterion 3). Given these two criteria pertain to second-order analyses (interaction or individual analyses) and thus might have lower statistical power (Blake & Gangestad, 2020), we applied a more lenient threshold using false discovery rate (FDR) correction (Benjamini & Hochberg, 1995) on the above-mentioned regions.”

      With these adjustments, we consistently identified similar brain regions as observed in our previous version. Specifically, we found that only the right 8C region met the three criteria in the conflict similarity analysis. In addition, the regions meeting the criteria for the orientation effect included the FEF and IP2 in left hemisphere, and V1, V2, POS1, and PF in the right hemisphere. We have thoroughly revised the description of our results, updated the figures and tables in both the revised manuscript and supplementary material to accurately reflect these outcomes.

      Reference:

      Blake, K. R., & Gangestad, S. (2020). On Attenuated Interactions, Measurement Error, and Statistical Power: Guidelines for Social and Personality Psychologists. Pers Soc Psychol Bull, 46(12), 1702-1711. https://doi.org/10.1177/0146167220913363

      Minor:

      1. Figure 8 should come much earlier (e.g, incorporated into Figure 1), and there should be consistent terms for 'cognitive map' and 'conflict similarity'.

      Response: We appreciate this suggestion. Considering that Figure 7 (“The crosssubject RSA model and the rationale”) also describes the models, we have merged Figure 7 and 8 and moved the new figure ahead, before we report the RSA results. Now you could find it in the new Figure 4, see below. We did not incorporate them into Figure 1 since Figure 1 is already too crowded.

      Author response image 3.

      Fig. 4. Rationale of the cross-subject RSA model and the schematic of key RSMs. A) The RSM is calculated as the Pearson’s correlation between each pair of conditions across the 35 subjects. For 17 subjects, the stimuli were displayed on the top-left and bottom-right quadrants, and they were asked to respond with left hand to the upward arrow and right hand to the downward arrow. For the other 18 subjects, the stimuli were displayed on the top-right and bottom-left quadrants, and they were asked to respond with left hand to the downward arrow and right hand to the upward arrow. Within each subject, the conflict type and orientation regressors were perfectly covaried. For instance, the same conflict type will always be on the same orientation. To de-correlate conflict type and orientation effects, we conducted the RSA across subjects from different groups. For example, the bottom-right panel highlights the example conditions that are orthogonal to each other on the orientation, response, and Simon distractor, whereas their conflict type, target and spatial Stroop distractor are the same. The dashed boxes show the possible target locations for different conditions. (B) and (C) show the orthogonality between conflict similarity and orientation RSMs. The within-subject RSMs (e.g., Group1-Group1) for conflict similarity and orientation are all the same, but the cross-group correlations (e.g., Group2-Group1) are different. Therefore, we can separate the contribution of these two effects when including them as different regressors in the same linear regression model. (D) and (E) show the two alternative models. Like the cosine model (B), within-group trial pairs resemble betweengroup trial pairs in these two models. The domain-specific model is an identity matrix. The domaingeneral model is estimated from the absolute difference of behavioral congruency effect, but scaled to 0 (lowest similarity) – 1 (highest similarity) to aid comparison. The plotted matrices in B-E include only one subject each from Group 1 and Group 2. Numbers 1-5 indicate the conflict type conditions, for spatial Stroop, StHSmL, StMSmM, StLSmH, and Simon, respectively. The thin lines separate four different sub-conditions, i.e., target arrow (up, down) × congruency (incongruent, congruent), within each conflict type.

      In our manuscript, the term “cognitive map/space” was used when explaining the results in a theoretical perspective, whereas the “conflict similarity” was used to describe the regressor within the RSA. These terms serve distinct purposes in our study and cannot be interchangeably substituted. Therefore, we have retained them in their current format. However, we recognize that the initial introduction of the “Cognitive-Space model” may have appeared somewhat abrupt. To address this, we have included a brief explanatory note: “The model described above employs the cosine similarity measure to define conflict similarity and will be referred to as the Cognitive-Space model.”

    2. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you and the reviewers for further providing constructive comments and suggestions on our manuscript. On behalf of all the co-authors, I have enclosed a revised version of the above referenced paper. Below, I have merged similar public reviews and recommendations (if applicable) from each reviewer and provided point-by-point responses.

      Reviewer #1:

      People can perform a wide variety of different tasks, and a long-standing question in cognitive neuroscience is how the properties of different tasks are represented in the brain. The authors develop an interesting task that mixes two different sources of difficulty, and find that the brain appears to represent this mixture on a continuum, in the prefrontal areas involved in resolving task difficulty. While these results are interesting and in several ways compelling, they overlap with previous findings and rely on novel statistical analyses that may require further validation.

      Strengths

      1. The authors present an interesting and novel task for combining the contributions of stimulus-stimulus and stimulus-response conflict. While this mixture has been measured in the multi-source interference task (MSIT), this task provides a more graded mixture between these two sources of difficulty.

      2. The authors do a good job triangulating regions that encoding conflict similarity, looking for the conjunction across several different measures of conflict encoding. These conflict measures use several best-practice approaches towards estimating representational similarity.

      3. The authors quantify several salient alternative hypothesis and systematically distinguish their core results from these alternatives.

      4. The question that the authors tackle is important to cognitive control, and they make a solid contribution.

      The authors have addressed several of my concerns. I appreciate the authors implementing best practices in their neuroimaging stats.

      I think that the concerns that remain in my public review reflect the inherent limitations of the current work. The authors have done a good job working with the dataset they've collected.

      Response: We would like to thank the reviewer for the positive evaluation of our manuscript and the constructive comments and suggestions. In response to your suggestions and concerns, we have removed the Stroop/Simon-only and the Stroop+Simon models, revised our conclusion and modified the misleading phrases.

      We have provided detailed responses to your comments below.

      1. The evidence from this previous work for mixtures between different conflict sources makes the framing of 'infinite possible types of conflict' feel like a strawman. The authors cite classic work (e.g., Kornblum et al., 1990) that develops a typology for conflict which is far from infinite. I think few people would argue that every possible source and level of difficulty will have to be learned separately. This work provides confirmatory evidence that task difficulty is represented parametrically (e.g., consistent with the n-back, MOT, and random dot motion literature).

      notes for my public concerns.

      In their response, the authors say:

      'If each combination of the Stroop-Simon combination is regarded as a conflict condition, there would be infinite combinations, and it is our major goal to investigate how these infinite conflict conditions are represented effectively in a space with finite dimensions.'

      I do think that this is a strawman. The paper doesn't make a strong case that this position ('infinite combinations') is widely held in the field. There is previous work (e.g., n-back, multiple object tracking, MSIT, dot motion) that has already shown parametric encoding of task difficulty. This paper provides confirmatory evidence, using an interesting new task, that demand are parametric, but does not provide a major theoretical advance.

      Response: We agree that the previous expression may have seemed somewhat exaggerative. While it is not “infinite”, recent research indeed suggests that the cognitive control shows domain-specificity across various “domains”, including conflict types (Egner, 2008), sensory modalities (Yang et al., 2017), task-irrelevant stimuli (Spape et al., 2008), and task sets (Hazeltine et al., 2011), to name a few.

      These findings collectively support the notion that cognitive control is contextspecific (Bream et al., 2014). That is, cognitive control can be tuned and associated with different (and potentially large numbers of) contexts. Recently, Kikumoto and Mayr (2020) demonstrated that combinations of stimulus, rule and response in the same task formed separatable, conjunctive representations. They further showed that these conjunctive representations facilitate performance. This is in line with the idea that each stimulus-location combination in the present task may be represented separately in a domain-specific manner. Moreover, domain-general task representation can also become domain-specific with learning, which further increases the number of domain-specific conjunctive representations (Mill et al., 2023). In line with the domain-specific account of cognitive control, we referred to the “infinite combinations” in our previous response to emphasize the extreme case of domainspecificity. However, recognizing that the term “infinite” may lead to ambiguity, we have replaced it with phrases such as “a large number of”, “hugely varied”, in our revised manuscript.

      We appreciate the reviewer for highlighting the potential connection of our work to existing literature that showed the parametric encoding of task difficulty (e.g., Dagher et al., 1999; Ritz & Shenhav, 2023). For instance, in Ritz et al.’s (2023) study, they parametrically manipulated target difficulty based on consistent ratios of dot color, and found that the difficulty was encoded in the caudal part of dorsal anterior cingulate cortex. Analogically, in our study, the “difficulty” pertains to the behavioral congruency effect that we modulated within the spatial Stroop and Simon dimensions. Notably, we did identify univariate effects in the right dmPFC and IPS associated with the difficulty in the Simon dimension. This parametric effect may lend support to our cognitive space hypothesis, although we exercised caution in interpreting their significance due to the absence of a clear brain-behavioral relevance in these regions. We have added the connection of our work to prior literature in the discussion. The parametric encoding of conflict also mirrors prior research showing the parametric encoding of task demands (Dagher et al., 1999; Ritz & Shenhav, 2023).

      However, our analyses extend beyond solely testing the parametric encoding of difficulty. Instead, we focused on the multivariate representation of different conflict types, which we believe is independent from the univariate parametric encoding. Unlike the univariate encoding that relies on the strength within one dimension, the multivariate representation of conflict types incorporates both the spatial Stroop and Simon dimensions. Furthermore, we found that similar difficulty levels did not yield similar conflict representation, as indicated by the low similarity between the spatial Stroop and Simon conditions, despite both showing a similar level of congruency effect (Fig. S1). Additionally, we also observed an interaction between conflict similarity and difficulty (i.e., congruency, Fig. 4B/D), such that the conflict similarity effect was more pronounced when conflict was present. Therefore, we believe that our findings make contribution to the literature beyond the difficulty effect.

      Reference:

      Egner, T. (2008). Multiple conflict-driven control mechanisms in the human brain. Trends in Cognitive Sciences, 12(10), 374-380. https://doi.org/10.1016/j.tics.2008.07.001

      Yang, G., Nan, W., Zheng, Y., Wu, H., Li, Q., & Liu, X. (2017). Distinct cognitive control mechanisms as revealed by modality-specific conflict adaptation effects. Journal of Experimental Psychology: Human Perception and Performance, 43(4), 807-818. https://doi.org/10.1037/xhp0000351

      Spapé MM, Hommel B (2008). He said, she said: episodic retrieval induces conflict adaptation in an auditory Stroop task. Psychonomic Bulletin Review,15(6):1117-21. https://doi.org/10.3758/PBR.15.6.1117

      Hazeltine E, Lightman E, Schwarb H, Schumacher EH (2011). The boundaries of sequential modulations: evidence for set-level control. Journal of Experimental Psychology: Human Perception & Performance. 2011 Dec;37(6):1898-914. https://doi.org/10.1037/a0024662

      Braem, S., Abrahamse, E. L., Duthoo, W., & Notebaert, W. (2014). What determines the specificity of conflict adaptation? A review, critical analysis, and proposed synthesis. Frontiers in Psychology, 5, 1134. https://doi.org/10.3389/fpsyg.2014.01134

      Kikumoto A, Mayr U. (2020). Conjunctive representations that integrate stimuli, responses, and rules are critical for action selection. Proceedings of the National Academy of Sciences, 117(19):10603-10608. https://doi.org/10.1073/pnas.1922166117.

      Mill, R. D., & Cole, M. W. (2023). Neural representation dynamics reveal computational principles of cognitive task learning. bioRxiv. https://doi.org/10.1101/2023.06.27.546751

      Dagher, A., Owen, A. M., Boecker, H., & Brooks, D. J. (1999). Mapping the network for planning: a correlational PET activation study with the Tower of London task. Brain, 122 ( Pt 10), 1973-1987. https://doi.org/10.1093/brain/122.10.1973

      Ritz, H., & Shenhav, A. (2023). Orthogonal neural encoding of targets and distractors supports multivariate cognitive control. https://doi.org/10.1101/2022.12.01.518771

      1. (Public Reviews) The degree of Stroop vs Simon conflict is perfectly negatively correlated across conditions. This limits their interpretation of an integrated cognitive space, as they cannot separately measure Stroop and Simon effects. The author's control analyses have limited ability to overcome this task limitation. While these results are consistent with parametric encoding, they cannot adjudicate between combined vs separated representations.

      (Recommendations) I think that it is still an issue that the task's two features (stroop and simon conflict) are perfectly correlated. This fundamentally limits their ability to measure the similarity in these features. The authors provide several control analyses, but I think these are limited.

      Response: We need to acknowledge that the spatial Stroop and Simon components in the five conflict conditions were not “perfectly” correlated, with r = –0.89. This leaves some room for the preliminary model comparison to adjudicate between these models. However, it’s essential to note that conclusions based on these results must be tempered. In line with the reviewer’s observation, we agree that the high correlation between the two conflict sources posed a potential limitation on our ability to independently investigate the contribution of spatial Stroop and Simon conflicts. Therefore, in addition to the limitation we have previously acknowledged, we have now further revised our conclusion and adjusted our expressions accordingly.

      Specifically, we now regard the parametric encoding of cognitive control not as direct evidence of the cognitive space view but as preliminary evidence that led us to propose this hypothesis, which requires further testing. Notably, we have also modified the title from “Conflicts are represented in a cognitive space to reconcile domain-general and domain-specific cognitive control” to “Conflicts are parametrically encoded: initial evidence for a cognitive space view to reconcile the debate of domain-general and domain-specific cognitive control”. Also, we revised the conclusion as: In sum, we showed that the cognitive control can be parametrically encoded in the right dlPFC and guides cognitive control to adjust goal-directed behavior. This finding suggests that different cognitive control states may be encoded in an abstract cognitive space, which reconciles the long-standing debate between the domain-general and domain-specific views of cognitive control and provides a parsimonious and more broadly applicable framework for understanding how our brains efficiently and flexibly represents multiple task settings.

      From Recommendations The authors perform control analyses that test stroop-only and simon-only models. However, these analyses use a totally different similarity metric, that's based on set intersection rather than geometry. This metric had limited justification or explanation, and it's not clear whether these models fit worse because of the similarity metric. Even here, Simon-only model fit better than Stroop+Simon model. The dimensionality analyses may reflect the 1d manipulation by the authors (i.e. perfectly corrected stroop and simon effects).

      Response: The Jaccard measure is the most suitable method we can conceive of for assessing the similarity between two conflicts when establishing the Stroop-only and Simon-only models, achieved by projecting them onto the vertical or horizontal axes, respectively (Author response image 1A). This approach offers two advantages. First, the Jaccard similarity combines both similarity (as reflected by the numerator) and distance (reflected by the difference between denominator and numerator) without bias towards either. Second, the Jaccard similarity in our design is equivalent to the cosine similarity because the denominator in the cosine similarity is identical to the denominator in the Jaccard similarity (both are the radius of the circle, Author response image 1B).

      Author response image 1.

      Definition of Jaccard similarity. A) Two conflicts (1 and 2) are projected onto the spatial Stroop/Simon axis in the Stroop/Simon-only model, respectively. The Jaccard similarity for Stroop-only and Simon-only model are and respectively. Letters a-d are the projected vectors from the two conflicts to the two axes. Blue and red colors indicate the conflict conditions. Shorter vectors are the intersection and longer vectors are the union. B) According to the cosine similarity model, the similarity is defined as , where e is the projected vector from conflict 1 to conflict 2, and g is the vector of conflict 1. The Jaccard similarity for this case is defined by , where f is the projector vector from conflict 2 to itself. Because f = g in our design, the Jaccard similarity is equivalent to the cosine similarity.

      Therefore, we believe that the model comparisons between cosine similarity model and the Stroop/Simon-Only models were equitable. However, we acknowledge the reviewer’s and other reviewers’ concerns about the correlation between spatial Stroop and Simon conflicts, which reduces the space to one dimension (1d) and limits our ability to distinguish between the Stroop-only and Simon-only models, as well as between Stroop+Simon and cosine similarity models. While these distinctions are undoubtedly important for understanding the geometry of the cognitive space, we recognize that they go beyond the major objective of this study, that is, to differentiate the cosine similarity model from domain-general/specific models. Therefore, we have chosen to exclude the Stroop-only, Simon-only and Stroop+Simon models in our revised manuscript.

      Something that raised additional concerns are the RSMs in the key region of interest (Fig S5). The pure stroop task appears to be represented very differently from all of the conditions that include simon conflict.

      Together, I think these limitations reflect the structure of the task and research goals, not the statistical approach (which has been meaningfully improved).

      Response: We appreciate the reviewer for pointing this out. It is essential to clarify that our conclusions were based on the significant similarity modulation effect identified in our statistical analysis using the cosine similarity model, where we did not distinguish between the within-Stroop condition and the other four within-conflict conditions (Fig. 7A, now Fig. 8A). This means that the representation of conflict type was not biased by the seemingly disparities in the values shown here. Moreover, to specifically test the differences between the within-Stroop condition and the other within-conflict conditions, we conducted a mixed-effect model analysis only including trial pairs from the same conflict type. In this analysis, the primary predictor was the cross-condition difference (0 for within-Stroop condition and 1 for other within-conflict conditions). The results showed no significant cross-condition difference in either the incongruent (t = 1.22, p = .23) or the congruent (t = 1.06, p = .29) trials. Thus, we believe the evidence for different similarities is inconclusive in our data and decided not to interpret this numerical difference. We have added this note in the revised figure caption for Figure S5.

      Author response image 2.

      Fig. S5. The stronger conflict type similarity effect in incongruent versus congruent conditions. (A) Summary representational similarity matrices for the right 8C region in incongruent (left) and congruent (right) conditions, respectively. Each cell represents the averaged Pearson correlation of cells with the same conflict type and congruency in the 1400×1400 matrix. Note that the seemingly disparities in the values of Stroop and other within-conflict cells (i.e., the diagonal) did not reach significance for either incongruent (t = 1.22, p = .23) or congruent (t = 1.06, p = .29) trials. (2) Scatter plot showing the averaged neural similarity (Pearson correlation) as a function of conflict type similarity in both conditions. The values in both A and B are calculated from raw Pearson correlation values, in contrast to the z-scored values in Fig. 4D.

      Minor:

      • In the analysis of similarity_orientation, the df is very large (~14000). Here, and throughout, the df should be reflective of the population of subjects (ie be less than the sample size).

      Response: The large degrees of freedom (df) in our analysis stem from the fact that we utilized a mixed-effect linear model, incorporating all data points (a total of 400×35=14000). In mixed-effect models, the df is determined by subtracting the number of fixed effects (in our case, 7) from the total number of observations. Notably, we are in line with the literature that have reported the df in this manner (e.g., Iravani et al., 2021; Schmidt & Weissman, 2015; Natraj et al., 2022).

      Reference:

      Iravani B, Schaefer M, Wilson DA, Arshamian A, Lundström JN. The human olfactory bulb processes odor valence representation and cues motor avoidance behavior. Proc Natl Acad Sci U S A. 2021 Oct 19;118(42):e2101209118. https://doi.org/10.1073/pnas.2101209118.

      Schmidt, J.R., Weissman, D.H. Congruency sequence effects and previous response times: conflict adaptation or temporal learning?. Psychological Research 80, 590–607 (2016). https://doi.org/10.1007/s00426-015-0681-x.

      Natraj, N., Silversmith, D. B., Chang, E. F., & Ganguly, K. (2022). Compartmentalized dynamics within a common multi-area mesoscale manifold represent a repertoire of human hand movements. Neuron, 110(1), 154-174. https://doi.org/10.1016/j.neuron.2021.10.002.

      • it would improve the readability if there was more didactic justification for why analyses are done a certain way (eg justifying the jaccard metric). This will help less technically-savvy readers.

      Response: We appreciate the reviewer’s suggestion. However, considering the Stroop/Simon-only models in our design may not be a valid approach for distinguishing the contributions of the Stroop/Simon components, we have decided not to include the Jaccard metrics in our revised manuscript.

      Besides, to improve the readability, we have moved Figure S4 to the main text (labeled as Figure 7), and added the domain-general/domain-specific schematics in Figure 8.

      Author response image 3.

      Figure 8. Schematic of key RSMs. (A) and (B) show the orthogonality between conflict similarity and orientation RSMs. The within-subject RSMs (e.g., Group1-Group1) for conflict similarity and orientation are all the same, but the cross-group correlations (e.g., Group2-Group1) are different. Therefore, we can separate the contribution of these two effects when including them as different regressors in the same linear regression model. (C) and (D) show the two alternative models. Like the cosine model (A), within-group trial pairs resemble between-group trial pairs in these two models. The domain-specific model is an identity matrix. The domain-general model is estimated from the absolute difference of behavioral congruency effect, but scaled to 0(lowest similarity)-1(highest similarity) to aid comparison. The plotted matrices here include only one subject each from Group 1 and Group 2. Numbers 1-5 indicate the conflict type conditions, for spatial Stroop, StHSmL, StMSmM, StLSmH, and Simon, respectively. The thin lines separate four different sub-conditions, i.e., target arrow (up, down) × congruency (incongruent, congruent), within each conflict type.

      Reviewer #2:

      This study examines the construct of "cognitive spaces" as they relate to neural coding schemes present in response conflict tasks. The authors use a novel experimental design in which different types of response conflict (spatial Stroop, Simon) are parametrically manipulated. These conflict types are hypothesized to be encoded jointly, within an abstract "cognitive space", in which distances between task conditions depend only on the similarity of conflict types (i.e., where conditions with similar relative proportions of spatial-Stroop versus Simon conflicts are represented with similar activity patterns). Authors contrast such a representational scheme for conflict with several other conceptually distinct schemes, including a domain-general, domain-specific, and two task-specific schemes. The authors conduct a behavioral and fMRI study to test which of these coding schemes is used by prefrontal cortex. Replicating the authors' prior work, this study demonstrates that sequential behavioral adjustments (the congruency sequence effect) are modulated as a function of the similarity between conflict types. In fMRI data, univariate analyses identified activation in left prefrontal and dorsomedial frontal cortex that was modulated by the amount of Stroop or Simon conflict present, and representational similarity analyses (RSA) that identified coding of conflict similarity, as predicted under the cognitive space model, in right lateral prefrontal cortex.

      This study tackles an important question regarding how distinct types of conflict might be encoded in the brain within a computationally efficient representational format. The ideas postulated by the authors are interesting ones and the statistical methods are generally rigorous.

      Response: We would like to express our sincere appreciation for the reviewer’s positive evaluation of our manuscript and the constructive comments and suggestions. In response to your suggestions and concerns, we excluded the StroopOnly, SimonOnly and Stroop+Simon models, and added the schematic of domain-general/specific model RSMs. We have provided detailed responses to your comments below.

      The evidence supporting the authors claims, however, is limited by confounds in the experimental design and by lack of clarity in reporting the testing of alternative hypotheses within the method and results.

      1. Model comparison

      The authors commendably performed a model comparison within their study, in which they formalized alternative hypotheses to their cognitive space hypothesis. We greatly appreciate the motivation for this idea and think that it strengthened the manuscript. Nevertheless, some details of this model comparison were difficult for us to understand, which in turn has limited our understanding of the strength of the findings.

      The text indicates the domain-general model was computed by taking the difference in congruency effects per conflict condition. Does this refer to the "absolute difference" between congruency effects? In the rest of this review, we assume that the absolute difference was indeed used, as using a signed difference would not make sense in this setting. Nevertheless, it may help readers to add this information to the text.

      Response: We apologize for any confusion. The “difference” here indeed refers to the “absolute difference” between congruency effects. We have now clarified this by adding the word “absolute” accordingly.

      "Therefore, we defined the domain-general matrix as the absolute difference in their congruency effects indexed by the group-averaged RT in Experiment 2."

      Regarding the Stroop-Only and Simon-Only models, the motivation for using the Jaccard metric was unclear. From our reading, it seems that all of the other models --- the cognitive space model, the domain-general model, and the domain-specific model --- effectively use a Euclidean distance metric. (Although the cognitive space model is parameterized with cosine similarities, these similarity values are proportional to Euclidean distances because the points all lie on a circle. And, although the domain-general model is parameterized with absolute differences, the absolute difference is equivalent to Euclidean distance in 1D.) Given these considerations, the use of Jaccard seems to differ from the other models, in terms of parameterization, and thus potentially also in terms of underlying assumptions. Could authors help us understand why this distance metric was used instead of Euclidean distance? Additionally, if Jaccard must be used because this metric seems to be non-standard in the use of RSA, it would likely be helpful for many readers to give a little more explanation about how it was calculated.

      Response: We believe that the Jaccard similarity measure is consistent with the Cosine similarity measure. The Jaccard similarity is calculated as the intersection divided by the union. To define the similarity of two conflicts in the Stroop-only and Simon-only models, we first project them onto the vertical or horizontal axes, respectively (as shown in Author response image 1A). The Jaccard similarity in our design is equivalent to the cosine similarity because the denominator in the Jaccard similarity is identical to the denominator in the cosine similarity (both are the radius of the circle, Author response image 1B).

      However, it is important to note that a cosine similarity cannot be defined when conflicts are projected onto spatial Stroop or Simon axis simultaneously. Therefore, we used the Jaccard similarity in the previous version of our manuscript.

      Author response image 4.

      Definition of Jaccard similarity. A) Two conflicts (1 and 2) are projected onto the spatial Stroop/Simon axis in the Stroop/Simon-only model, respectively. The Jaccard similarity for Stroop-only and Simon-only model are and respectively. Letters a-d are the projected vectors from the two conflicts to the two axes. Blue and red colors indicate the conflict conditions. Shorter vectors are the intersection and longer vectors are the union. B) According to the cosine similarity model, the similarity is defined as , where e is the projected vector from conflict 1 to conflict 2, and g is the vector of conflict 1. The Jaccard similarity for this case is defined by , where f is the projector vector from conflict 2 to itself. Because f = g in our design, the Jaccard similarity is equivalent to the cosine similarity.

      However, we agree with the reviewer’s and other reviewers’ concern that the correlation between spatial Stroop and Simon conflicts makes it less likely to distinguish the Stroop+Simon from cosine similarity models. While distinguishing them is essential to understand the detailed geometry of the cognitive space, it is beyond our major purpose, that is, to distinguish the cosine similarity model with the domain-general/specific models. Therefore, we have chosen to exclude the Stroop-only, Simon-only and Stroop+Simon models from our revised manuscript.

      When considering parameterizing the Stroop-Only and Simon-Only models with Euclidean distances, one concern we had is that the joint inclusion of these models might render the cognitive space model unidentifiable due to collinearity (i.e., the sum of the Stroop-Only and Simon-Only models could be collinear with the cognitive space model). Could the authors determine whether this is the case? This issue seems to be important, as the presence of such collinearity would suggest to us that the design is incapable of discriminating those hypotheses as parameterized.

      Response: We acknowledge that our design does not allow for a complete differentiation between the parallel encoding (StroopOnly+SimonOnly) model and the cognitive space model, given their high correlation (r = 0.85). However, it is important to note that the StroopOnly+SimonOnly model introduces more free parameters, making the model fitting poorer than the cognitive space model.

      Additionally, the cognitive space model also shows high correlations with the StroopOnly and SimonOnly models (both rs = 0.66). It is crucial to emphasize that our study’s primary goal does not involve testing the parallel encoding hypothesis (through the StroopOnly+SimonOnly model). As a result, we have chosen to remove the model comparison results with the StroopOnly, SimonOnly and StroopOnly+SimonOnly models. Instead, the cognitive space model shows lower correlation with the purely domain-general (r = −0.16) and domain-specific (r = 0.46) models.

      1. Issue of uniquely identifying conflict coding

      We certainly appreciate the efforts that authors have taken to address potential confounders for encoding of conflict in their original submission. We broach this question not because we wish authors to conduct additional control analyses, but because this issue seems to be central to the thesis of the manuscript and we would value reading the authors' thoughts on this issue in the discussion.

      To summarize our concerns, conflict seems to be a difficult variable to isolate within aggregate neural activity, at least relative to other variables typically studied in cognitive control, such as task-set or rule coding. This is because it seems reasonable to expect that many more nuisance factors covary with conflict -- such as univariate activation, level of cortical recruitment, performance measures, arousal --- than in comparison with, for example, a well-designed rule manipulation. Controlling for some of these factors post-hoc through regression is commendable (as authors have done here), but such a method will likely be incomplete and can provide no guarantees on the false positive rate.

      Relatedly, the neural correlates of conflict coding in fMRI and other aggregate measures of neural activity are likely of heterogeneous provenance, potentially including rate coding (Fu et al., 2022), temporal coding (Smith et al., 2019), modulation of coding of other more concrete variables (Ebitz et al., 2020, 10.1101/2020.03.14.991745; see also discussion and reviews of Tang et al., 2016, 10.7554/eLife.12352), or neuromodulatory effects (e.g., Aston-Jones & Cohen, 2005). Some of these origins would seem to be consistent with "explicit" coding of conflict (conflict as a representation), but others would seem to be more consistent with epiphenomenal coding of conflict (i.e., conflict as an emergent process). Again, these concerns could apply to many variables as measured via fMRI, but at the same time, they seem to be more pernicious in the case of conflict. So, if authors consider these issues to be germane, perhaps they could explicitly state in the discussion whether adopting their cognitive space perspective implies a particular stance on these issues, how they interpret their results with respect to these issues, and if relevant, qualify their conclusions with uncertainty on these issues.

      Response: We appreciate the reviewer’s insightful comments regarding the representation and process of conflict.

      First, we agree that the conflict is not simply a pure feature like a stimulus but often arises from the interaction (e.g., dimension overlap) between two or more aspects. For example, in the manual Stroop, conflict emerges from the inconsistent semantic information between color naming and word reading. Similarly, other higher-order cognitive processes such as task-set also underlie the relationship between concrete aspects. For instance, in a face/house categorization task, the taskset is the association between face/house and the responses. When studying these higher-order processes, it is often impossible to completely isolate them from bottomup features. Therefore, methods like the representational similarity analysis and regression models are among the limited tools available to attempt to dissociate these concrete factors from conflict representation. While not perfect, this approach has been suggested and utilized in practice (Freund et al., 2021).

      Second, we agree that conflict can be both a representation and an emerging process. These two perspectives are not necessarily contradictory. According to David Marr’s influential three-level theory (Marr, 1982), representation is the algorithm of the process to achieve a goal based on the input. Therefore, a representation can refer to not only a static stimulus (e.g., the visual representation of an image), but also a dynamic process. Building on this perspective, we posit that the representation of cognitive control consists of an array of dynamic representations embedded within the overall process. A similar idea has been proposed that the abstract task profiles can be progressively constructed as a representation in our brain (Kikumoto & Mayr, 2020).

      We have incorporated this discussion into the manuscript:

      "Recently an interesting debate has arisen concerning whether cognitive control should be considered as a process or a representation (Freund, Etzel, et al., 2021). Traditionally, cognitive control has been predominantly viewed as a process. However, the study of its representation has gained more and more attention. While it may not be as straightforward as the visual representation (e.g., creating a mental image from a real image in the visual area), cognitive control can have its own form of representation. An influential theory, Marr’s (1982) three-level model proposed that representation serves as the algorithm of the process to achieve a goal based on the input. In other words, representation can encompass a dynamic process rather than being limited to static stimuli. Building on this perspective, we posit that the representation of cognitive control consists of an array of dynamic representations embedded within the overall process. A similar idea has been proposed that the representation of task profiles can be progressively constructed with time in the brain (Kikumoto & Mayr, 2020)."

      Reference:

      Freund, M. C., Etzel, J. A., & Braver, T. S. (2021). Neural Coding of Cognitive Control: The Representational Similarity Analysis Approach. Trends in Cognitive Sciences, 25(7), 622-638. https://doi.org/10.1016/j.tics.2021.03.011

      Marr, D. C. (1982). Vision: A computational investigation into human representation and information processing. New York: W.H. Freeman.

      Kikumoto A, Mayr U. (2020). Conjunctive representations that integrate stimuli, responses, and rules are critical for action selection. Proceedings of the National Academy of Sciences, 117(19):10603-10608. https://doi.org/10.1073/pnas.1922166117.

      1. Interpretation of measured geometry in 8C

      We appreciate the inclusion of the measured similarity matrices of area 8C, the key area the results focus on, to the supplemental, as this allows for a relatively model-agnostic look at a portion of the data. Interestingly, the measured similarity matrix seems to mismatch the cognitive space model in a potentially substantive way. Although the model predicts that the "pure" Stroop and Simon conditions will have maximal self-similarity (i.e., the Stroop-Stroop and Simon-Simon cells on the diagonal), these correlations actually seem to be the lowest, by what appears to be a substantial margin (particularly the Stroop-Stroop similarities). What should readers make of this apparent mismatch? Perhaps authors could offer their interpretation on how this mismatch could fit with their conclusions.

      Response: We appreciate the reviewer for bringing this to our attention. It is essential to clarify that our conclusions were based on the significant similarity modulation effect observed in our statistical analysis using the cosine similarity model, where we did not distinguish between the within-Stroop condition and the other four withinconflict conditions (Fig. 7A). This means that the representation of conflict type was not biased by the seemingly disparities in the values shown here. Moreover, to specifically address the potential differences between the within-Stroop condition and the other within-conflict conditions, we conducted a mixed-effect model. In this analysis, the primary predictor was the cross-condition difference (0 for within-Stroop condition and 1 for other within-conflict conditions). The results showed no significant cross-condition difference in either the incongruent trials (t = 1.22, p = .23) or the congruent (t = 1.06, p = .29) trials. Thus, we believe the evidence for different similarities is inconclusive in our data and decided not to interpret this numerical difference.

      We have added this note in the revised figure caption for Figure S5.

      Author response image 5.

      Fig. S5. The stronger conflict type similarity effect in incongruent versus congruent conditions. (A) Summary representational similarity matrices for the right 8C region in incongruent (left) and congruent (right) conditions, respectively. Each cell represents the averaged Pearson correlation of cells with the same conflict type and congruency in the 1400×1400 matrix. Note that the seemingly disparities in the values of Stroop and other within-conflict cells (i.e., the diagonal) did not reach significance for either incongruent (t = 1.22, p = .23) or congruent (t = 1.06, p = .29) trials. (2) Scatter plot showing the averaged neural similarity (Pearson correlation) as a function of conflict type similarity in both conditions. The values in both A and B are calculated from raw Pearson correlation values, in contrast to the z-scored values in Fig. 4D.

      1. It would likely improve clarity if all of the competing models were displayed as summarized RSA matrices in a single figure, similar to (or perhaps combined with) Figure 7.

      Response: We appreciate the reviewer’s suggestion. We now have incorporated the domain-general and domain-specific models into the Figure 7 (now Figure 8).

      Author response image 6.

      Figure 8. Schematic of key RSMs. (A) and (B) show the orthogonality between conflict similarity and orientation RSMs. The within-subject RSMs (e.g., Group1-Group1) for conflict similarity and orientation are all the same, but the cross-group correlations (e.g., Group2-Group1) are different. Therefore, we can separate the contribution of these two effects when including them as different regressors in the same linear regression model. (C) and (D) show the two alternative models. Like the cosine model (A), within-group trial pairs resemble between-group trial pairs in these two models. The domain-specific model is an identity matrix. The domain-general model is estimated from the absolute difference of behavioral congruency effect, but scaled to 0(lowest similarity)-1(highest similarity) to aid comparison. The plotted matrices here include only one subject each from Group 1 and Group 2. Numbers 1-5 indicate the conflict type conditions, for spatial Stroop, StHSmL, StMSmM, StLSmH, and Simon, respectively. The thin lines separate four different sub-conditions, i.e., target arrow (up, down) × congruency (incongruent, congruent), within each conflict type.

      1. Because this model comparison is key to the main inferences in the study, it might also be helpful for most readers to move all of these RSA model matrices to the main text, instead of in the supplemental.

      Response: We thank the reviewer for this suggestion. We have moved the Fig. S4 to the main text, labeled as the new Figure 7.

      1. It may be worthwhile to check how robust the observed brain-behavior association (Fig 4C) is to the exclusion of the two datapoints with the lowest neural representation strength measure, as these points look like they have high leverage.

      Response: We calculated the Pearson correlation after excluding the two points and found it does not affect the results too much, with the r = 0.50, p = .003 (compared to the original r = 0.52, p = .001).

      Additionally, we found the two axes were mistakenly shifted in Fig 4C. Therefore, we corrected this error in the revised manuscript. The correlation results would not be influenced.

      Author response image 7.

      Fig. 4. The conflict type effect. (A) Brain regions surviving the Bonferroni correction (p < 0.0001) across the regions (criterion 1). Labeled regions are those meeting the criterion 2. (B) Different encoding of conflict type in the incongruent with congruent conditions. * Bonferroni corrected p < .05. (C) The brain-behavior correlation of the right 8C (criterion 3). The x-axis shows the beta coefficient of the conflict type effect from the RSA, and the y-axis shows the beta coefficient obtained from the behavioral linear model using the conflict similarity to predict the CSE in Experiment 2. (D) Illustration of the different encoding strength of conflict type similarity in incongruent versus congruent conditions of right 8C. The y-axis is derived from the z-scored Pearson correlation coefficient, consistent with the RSA methodology. See Fig. S4B for a plot with the raw Pearson correlation measurement. l = left; r = right.

      Reviewer #3:

      Yang and colleagues investigated whether information on two task-irrelevant features that induce response conflict is represented in a common cognitive space. To test this, the authors used a task that combines the spatial Stroop conflict and the Simon effect. This task reliably produces a beautiful graded congruency sequence effect (CSE), where the cost of congruency is reduced after incongruent trials. The authors measured fMRI to identify brain regions that represent the graded similarity of conflict types, the congruency of responses, and the visual features that induce conflicts. They applied univariate, multivariate, and connectivity analyses to fMRI data to identify brain regions that represent the graded similarity of conflict types, the congruency of responses, and the visual features that induce conflicts. They further directly assessed the dimensionality of represented conflict space.

      The authors identified the right dlPFC (right 8C), which shows 1) stronger encoding of graded similarity of conflicts in incongruent trials and 2) a positive correlation between the strength of conflict similarity type and the CSE on behavior. The dlPFC has been shown to be important for cognitive control tasks. As the dlPFC did not show a univariate parametric modulation based on the higher or lower component of one type of conflict (e.g., having more spatial Stroop conflict or less Simon conflict), it implies that dissimilarity of conflicts is represented by a linear increase or decrease of neural responses. Therefore, the similarity of conflict is represented in multivariate neural responses that combine two sources of conflict.

      The strength of the current approach lies in the clear effect of parametric modulation of conflict similarity across different conflict types. The authors employed a clever cross-subject RSA that counterbalanced and isolated the targeted effect of conflict similarity, decorrelating orientation similarity of stimulus positions that would otherwise be correlated with conflict similarity. A pattern of neural response seems to exist that maps different types of conflict, where each type is defined by the parametric gradation of the yoked spatial Stroop conflict and the Simon conflict on a similarity scale. The similarity of patterns increases in incongruent trials and is correlated with CSE modulation of behavior.

      The main significance of the paper lies in the evidence supporting the use of an organized "cognitive space" to represent conflict information as a general control strategy. The authors thoroughly test this idea using multiple approaches and provide convincing support for their findings. However, the universality of this cognitive strategy remains an open question.

      (Public Reviews) Taken together, this study presents an exciting possibility that information requiring high levels of cognitive control could be flexibly mapped into cognitive map-like representations that both benefit and bias our behavior. Further characterization of the representational geometry and generalization of the current results look promising ways to understand representations for cognitive control.

      Response: We would like to thank the reviewer for the positive evaluation of our manuscript and for providing constructive comments. In response to your suggestions, we have acknowledged the potential limitation of the design and the cross-subject RSA approach, and incorporated the open questions to the discussions. Please find our detailed responses below.

      The task presented in the study involved two sources of conflict information through a single salient visual input, which might have encouraged the utilization of a common space.

      Response: We agree that the unified visual input in our design may have facilitated the utilization of a common space. However, we believe the stimuli are not necessarily unified in the construction of the common space. To further test the potential interaction between the concrete stimulus setting and the cognitive space representation, it is necessary to use varied stimuli in future research. We have left this as an open question in the discussion:

      Can we effectively map any sources of conflict with completely different stimuli into a single space?

      The similarity space was analyzed at the level of between-individuals (i.e., crosssubject RSA) to mitigate potential confounds in the design, such as congruency and the orientation of stimulus positions. This approach makes it challenging to establish a direct link between the quality of conflict space representation and the patterns of behavioral adaptations within individuals.

      Response: By setting the variables as random effects at the subject level, we have extracted the individual effects that incorporate both the group-level fixed effects and individual-level random effects. We believe this approach yields results that are as reliable, if not more, than effects calculated from individual data only. First, the mixed effect linear (LME) model has included all the individual data, forming the basis for establishing random effects. Therefore, the individual effects derived from this approach inherently reflect the individual-specific effects. To support this notion, we have included a simulation script (accessible in the online file “simulation_LME.mlx” at https://osf.io/rcq8w) to demonstrate the strong consistency between the two approaches (see Author response image 8). In this simulation, we generated random data (Y) for 35 subjects, each containing 20 repeated measurements across 5 conditions. To streamline the simulation, we only included one predictor (X), which was treated as both fixed and random effects at the subject level. We applied two methods to calculate the individual beta coefficient. The first involved extracting individual beta coefficients from the LME model by summing the fixed effect with the subject-specific random effect. The second method was entailed conducting a regression analysis using data from each subject to obtain the slope. We tested their consistency by calculating the Pearson correlation between the derived beta coefficients. This simulation was repeated 100 times.

      Author response image 8.

      The consistent individual beta coefficients between the mixed effect model and the individual regression analysis. A) The distribution of Pearson correlation between the two methods for 100 times. B) An example from the simulation showing the highly correlated results from the two methods. Each data point indicates a subject (n=35).

      Second, the potential difference between the two methods lies in that the LME model have also taken the group-level variance into account, such as the dissociable variances of the conflict similarity and orientation across subject groups. This enabled us to extract relatively cleaner conflict similarity effects for each subject, which we believe can be better linked to the individual behavioral adaptations. Moreover, we have extracted the behavioral adaptations scores (i.e., the similarity modulation effect on CSE) using a similar LME approach. Conducting behavioral analysis solely using individual data would have been less reliable, given the limited sample size of individual data (~32 points per subject). This also motivated us to maintain consistency by extracting individual neural effects using LME models.

      Furthermore, it remains unclear at which cognitive stages during response selection such a unified space is recruited. Can we effectively map any sources of conflict into a single scale? Is this unified space adaptively adjusted within the same brain region? Additionally, does the amount of conflict solely define the dimensions of this unified space across many conflict-inducing tasks? These questions remain open for future studies to address.

      Response: We appreciate the reviewer’s constructive open questions. We respond to each of them based on our current understanding.

      1) It remains unclear at which cognitive stages during response selection such a unified space is recruited.

      We anticipate that the cognitive space is recruited to guide the transference of behavioral CSE at two critical stages. The first stage involves the evaluation of control demands, where the representational distance/similarity between previous and current trials influences the adjustment of cognitive control. The second stage pertains to is control execution, where the switch from one control state to another follows a path within the cognitive space. It is worth noting that future studies aiming to address this question may benefit from methodologies with higher temporal resolutions, such as EEG and MEG, to provide more precise insights into the temporal dynamics of the process of cognitive space recruitment.

      2) Can we effectively map any sources of conflict into a single scale?

      It is possible that various sources of conflict can be mapped onto the same space based on their similarity, even if finding such an operational defined similarity may be challenging. However, our results may offer an approach to infer the similarity between two conflicts. One way is to examine their congruency sequence effect (CSE), with a stronger CSE suggesting greater similarity. The other way is to test their representational similarity within the dorsolateral prefrontal cortex.

      3) Is this unified space adaptively adjusted within the same brain region? We do not have an answer to this question. We showed that the cognitive space does not change with time (Note. S3). What have adjusted is the control demand to resolve the quickly changing conflict conditions from trial to trial. Though, it is an interesting question whether the cognitive space may be altered, for example, when the mental state changes significantly. And if yes, we can further test whether the change of cognitive space is also within the right dlPFC.

      4) Additionally, does the amount of conflict solely define the dimensions of this unified space across many conflict-inducing tasks?

      Our understanding of this comment is that the amount of conflict refers to the number of conflict sources. Based on our current finding, the dimensions of the space are indeed defined by how many different conflict sources are included. However, this would require the different conflict sources are orthogonal. If some sources share some aspects, the cognitive space may collapse to a lower dimension. We have incorporated the first question into the discussion:

      Moreover, we anticipate that the representation of cognitive space is most prominently involved at two critical stages to guide the transference of behavioral CSE. The first stage involves the evaluation of control demands, where the representational distance/similarity between previous and current trials influences the adjustment of cognitive control. The second stage pertains to control execution, where the switch from one control state to another follows a path within the cognitive space. However, we were unable to fully distinguish between these two stages due to the low temporal resolution of fMRI signals in our study. Future research seeking to delve deeper into this question may benefit from methodologies with higher temporal resolutions, such as EEG and MEG.

      We have included the other questions into the manuscript as open questions, calling for future research.

      Several interesting questions remains to be answered. For example, is the dimension of the unified space across conflict-inducing tasks solely determined by the number of conflict sources? Can we effectively map any sources of conflict with completely different stimuli into a single space? Is the cognitive space geometry modulated by the mental state? If yes, what brain regions mediate the change of cognitive space?

      Minor comments:

      • The original comment about out-of-sample predictions to examine the continuity of the space was a suggestion for testing neural representations, not behavior (I apologize for the lack of clarity). Given the low dimensionality of the conflict space shown by the participation ratio, we expect that linear separability exists only among specific combinations of conditions. For example, the pair of conflicts 1 and 5 together is not linearly separable from conflicts 2 and 3. But combined with other results, this is already implied.

      Response: We apologize for the misunderstanding. In fact, performing a prediction analysis using the extensive RSM in our study does presents certain challenges, primarily due to its substantial size (1400x1400) and the intricate nature of the mixed-effect linear model. In our efforts to simplify the prediction process by excluding random effects, we did observe a correlation between the predicted and original values, albeit a relatively small Pearson correlation coefficient of r = 0.024, p < .001. This small correlation can be attributed to two key factors. First, the exclusion of data points impacts not only the conflict similarity regressor but also other regressors within the model, thereby diminishing the predictive power. Secondly, the large amount of data points in the model heightens the risk of overfitting, subsequently reducing the model’s capacity for generalization and increasing the likelihood of unreliable predictions. Given these potential problems, we have opted not to include this prediction in the revised manuscript.

    1. Author Response

      Author responses to the original review:

      The data we produce are not criticized as such and thus, do not require revision; the criticisms concern our interpretation of them. General themes of the reviews are that i) genetic signatures do not matter for defining neuronal types (here sympathetic versus parasympathetic); ii) that a cholinergic postganglionic autonomic neuron must be parasympathetic; and iii) that some physiology of the pelvic region would deserve the label “parasympathetic”. We answered the latter argument in (Espinosa-Medina et al., 2018) to which we refer the interested reader; and we fully disagree with the first two. Of note, part of the last sentence of the eLife assessment is misleading and does not reflect the referees’ comments. Our paper analyses genetic differences between the cranial and sacral outflow and uses them to argue that they cannot be both parasympathetic. The eLife assessment acknowledges the “genetic differences” but concludes that, somehow, they don’t detract from a common parasympathetic identity. We take issue with this paradox, of course, but it is coherent with the referee’s comments. On the other hand, the eLife assessment alone pushes the paradox one step further by stating that “functional differences” between the cranial and sacral outflows can’t either prevent them from being both parasympathetic. We would also object to this, but the only “functional differences” used by the referees to dismiss our diagnostic of a sympathetic-like character (rather than parasympathetic) for the sacral outflow are between noradrenergic and cholinergic, and between sympathetic and parasympathetic (and we also disagree with those, see above, and below) —not between cranial and sacral.

      We will thus use the opportunity offered by eLife to keep the paper as it is (with a few minor stylistic changes). We respond below to the referees’ detailed remarks and hope that the publication, as per eLife new model, of the paper, the referees’ comments and our response will help move the field forward.

      Public review by Referee #1

      “Consistently, the P3 cluster of neurons is located close to sympathetic neuron clusters on the map, echoing the conventional understanding that the pelvic ganglia are mixed, containing both sympathetic and parasympathetic neurons”.

      The greater closeness of P3 than of P1/2/4 to the sympathetic cluster can be used to judge P1/2/4 less sympathetic than P3 (and more… something else), but not more parasympathetic. There is no echo of the “conventional understanding” here.

      “A closer look at the expression showed that some genes are expressed at higher levels in sympathetic neurons and in P2 cluster neurons ” [We assume that the referee means “in sympathetic neurons and in P3 cluster neurons”] but much weaker in P1, P2, and P4 neurons such as Islet1 and GATA2, and the opposite is true for SST. Another set of genes is expressed weakly across clusters, like HoxC6, HoxD4, GM30648, SHISA9, and TBX20.

      These statements are inaccurate; On the one hand, the classification is not based on impression by visual inspection of the heatmap, but by calculations, using thresholds. Admittedly, the thresholds have an arbitrary aspect, but the referee can verify (by eye inspection of heatmap) that genes which we calculate as being at “higher levels in sympathetic neurons and in P3 cluster neurons, but much weaker in P1, P2, and P4 neurons” or vice versa, i.e. noradrenergic or cholinergic neurons (genes from groups V and VI, respectively), have a much bigger difference than those cited by the referee, indeed are quasi-absent from the weaker clusters or ganglia. In addition, even by subjective eye inspection:

      Islet is equally expressed in P4 and sympathetics.

      SST is equally expressed in P1 and sympathetics.

      Tbx20 is equally expressed in P2 and sympathetics.

      HoxC6, HoxD4, GM30648, SHISA9 are equally expressed in all clusters and all sympathetic ganglia.

      “Since the pelvic ganglia are in a caudal body part, it is not surprising to have genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa (to have genes expressed in sphenopalatine ganglia, but not in pelvic ganglia), according to well recognized rostro-caudal body patterning, such as nested expression of hox genes.”

      We do not simply show “genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa”, i.e. a genetic distance between pelvic and sphenopalatine, but many genes expressed in all pelvic cells and sympathetic ones, i.e. a genetic proximity between pelvic and sympathetic. This situation can be deemed “unsurprising”, but it can only be used to question the parasympathetic nature of pelvic cells (as we do), or considered irrelevant (as the referee does, because genes would not define cell types, see our response to an equivalent stance by Referee#2). Concerning Hox genes, we do take them into account, and speculate in the discussion that their nested expression is key to the structure of the autonomic nervous system, including its division into sympathetic and parasympathetic outflows.

      It is much simpler and easier to divide the autonomic nervous system into sympathetic neurons that release noradrenaline versus parasympathetic neurons that release acetylcholine, and these two systems often act in antagonistic manners, though in some cases, these two systems can work synergistically. It also does not matter whether or not pelvic cholinergic neurons could receive inputs from thoracic-lumbar preganglionic neurons (PGNs), not just sacral PGNs; such occurrence only represents a minor revision of the anatomy. In fact, it makes much more sense to call those cholinergic neurons located in the sympathetic chain ganglia parasympathetic.

      This “minor revision of the anatomy” would make spinal preganglionic neurons which are universally considered sympathetic (in the thoraco-lumbar chord), synapse onto large numbers of parasympathetic neurons (in the paravertebral chains for sweat glands and periosteum, and in the pelvic ganglion), robbing these terms of any meaning.

      Thus, from the functionality point of view, it is not justified to claim that "pelvic organs receive no parasympathetic innervation".

      There never was any general or rigorous functional definition of the sympathetic and parasympathetic nervous systems — it is striking, almost ironic, that Langley, creator of the term parasympathetic and the ultimate physiologist, provides an exclusively anatomic definition in his Autonomic Nervous System, Part I. Hence, our definition cannot clash with any “functionality point of view”. In fact, as we briefly say in the discussion and explore in (Espinosa-Medina et al., 2018), it is the “sacral parasympathetic” paradigm which is unjustified from a functionality point of view, for implying a functional antagonism across the lumbo-sacral gap, which has been disproven repeatedly. It remains to be determined which neurons are antagonistic to which on the blood vessels of the external genitals; antagonism within one division of the autonomic nervous system would not be without precedent (e.g. there exist both vasoconstrictor and vasodilator sympathetic neurons, and both, inhibitor and activator enteric motoneurons). The way to this question is finally open to research, and as referee#2 says “it is early days”.

      Public review by Referee #2

      This work further documents differences between the cranial and sacral parasympathetic outflows that have been known since the time of Langley - 100 years ago.

      We assume that the referee means that it is the “cranial and sacral parasympathetic outflows” which “have been known since the time of Langley”, not their differences (that we would “further document”): the differences were explicitly negated by Langley. As a matter of fact, the sacral and cranial outflows were first likened to each other by Gaskell, 140 years ago (Gaskell, 1886). This anatomic parallel (which is deeply flawed (Espinosa-Medina et al., 2018)) was inherited wholesale by Langley, who added one physiological argument (Langley and Anderson, 1895) (which has been contested many times (Espinosa-Medina et al., 2018) and references within).

      In addition, the sphenopalatine and other cranial ganglia develop from placodes and the neural crest, while sympathetic and sacral ganglia develop from the neural crest alone.

      Contrary to what the referee says, the sphenopalatine has no placodal contribution. There is no placodal contribution to any autonomic ganglion, sympathetic or parasympathetic (except an isolated claim concerning the ciliary ganglion (Lee et al., 2003)). All autonomic ganglia derive from the neural crest as determined a long time ago in chicken. For the sphenopalatine in mouse, see our own work (Espinosa-Medina et al., 2016).

      One feature that seems to set the pelvic ganglion apart is […] the convergence of preganglionic sympathetic and parasympathetic synapses on individual ganglion cells (Figure 3). This unusual organization has been reported before using microelectrode recordings (see Crowcroft and Szurszewski, J Physiol (1971) and Janig and McLachlan, Physiol Rev (1987)). Anatomical evidence of convergence in the pelvic ganglion has been reported by Keast, Neuroscience (1995).

      Contrary to what the referee says, we do not provide in Figure 3 any evidence for anatomic convergence, i.e. for individual pelvic ganglion cells receiving dual lumbar and sacral inputs. We simply show that cholinergic neurons figure prominently among targets of the lumbar pathway. This said, the convergence of both pathways on the same pelvic neurons, described in the references cited by the referee, is another major problem in the theory of the “sacral parasympathetic” (as we discussed previously (Espinosa-Medina et al., 2018)).

      It should also be noted that the anatomy of the pelvic ganglion in male rodents is unique. Unlike other species where the ganglion forms a distributed plexus of mini-ganglia, in male rodents the ganglion coalesces into one structure that is easier to find and study. Interestingly the image in Figure 3A appears to show a clustering of Chat-positive and Th-positive neurons. Does this result from the developmental fusion of mini ganglia having distinct sympathetic and parasympathetic origins?

      The clustering of Chat-positive and Th-positive cells could arise from a number of developmental mechanisms, that we have no idea of at the moment. This has no bearing on sympathetic and parasympathetic.

      In addition, Brunet et al dismiss the cholinergic and noradrenergic phenotypes as a basis for defining parasympathetic and parasympathetic neurons. However, see the bottom of Figure S4 and further counterarguments in Horn (Clin Auton Res (2018)).

      The bottom of Figure S4 simply indicates which cells are cholinergic and adrenergic. We have already expounded many times that noradrenergic and cholinergic do not coincide with sympathetic and parasympathetic. Henry Dale (Nobel Prize 1936) demonstrated this. Langley himself devoted several pages of his final treatise to this exception to his “Theory on the relation of drugs to nerve system” (Langley, 1921) (p43) (which was actually a bigger problem for him than it is for us, for reason which are too long to recount here; it is as if the theoretical difficulties experienced by Langley had been internalized to this day in the form of a dismissal of the cholinergic sympathetic neurons as a slightly scandalous but altogether forgettable oddity). (Horn, 2018) reviews the evidence that the thoracic cholinergic sympathetic phenotype is brought about by a secondary switch upon interaction with the target and argues that this would be a fundamental difference with the sacral “parasympathetic”. But in fact the secondary switch is preceded by co-expression of ChAT and VAChT with Th in most sympathetic neurons (reviewed in (Ernsberger and Rohrer, 2018)); and we have no idea of the dynamic in the pelvic ganglion. It may also be mentioned in this context that target-dependent specification of neuronal identity has also been demonstrated of other types of sympathetic neurons ((Furlan et al., 2016)

      What then about neuropeptides, whose expression pattern is incompatible with the revised nomenclature proposed by Brunet et al.?

      There was never any neuropeptide-inspired criterion for a nomenclature of the autonomic nervous system.

      Figure 1B indicates that VIP is expressed by sacral and cranial ganglion cells, but not thoracolumbar ganglion cells.

      Contrary to what the referee says, there are VIP-positive cells in our sympathetic data set and even strongly positive ones, except they are scattered and few (red bars on the UMAP). They correspond to cholinergic sympathetics, likely sudomotor, which are known to contain VIP (e.g.(Anderson et al., 2006)(Stanke et al., 2006)). In other words, VIP is probably part of what we call the cholinergic synexpression group (but was not placed in it by our calculations, probably because of a low expression level in sympathetic noradrenergic cells).

      The authors do not mention neuropeptide Y (NPY). The immunocytochemistry literature indicates that NPY is expressed by a large subpopulation of sympathetic neurons but never by sacral or cranial parasympathetic neurons.

      Contrary to what the referee says, Keast (Keast, 1995) finds 3.7% of pelvic neurons double stained for NPY and VIP in male rats, and says (Keast, 2006) that in females “co-expression of NPY and VIP is common” ( thus in cholinergic neurons that the referee calls “parasympathetic”). Single cell transcriptomics is probably more sensitive than immunochemistry, and in our dichotomized data set (table S1), NPY is expressed in all pelvic clusters and all sympathetic ganglia. In other words, it is one more argument for their kinship. It does not appear in the heatmap because it ranks below the 100 top genes.

      Answer to the original recommendations by Referee #2

      Introduction - the use of the words 'consensual' and 'promiscuity' are not clear and rather loaded in the context of the pelvic ganglia. Pick alternative words.

      There is no sexual innuendo inherent in “promiscuity”: “condition of elements of different kinds grouped or massed together without order” (Oxford English Dictionary). We replaced “never consensual” by “never generally accepted”.

      Results - Page 2 - what sex were the mice? Previous works indicate significant sexual dimorphism in the pelvic ganglion.

      The mice included both males and females, and male and female cells are represented in all ganglia and clusters. This is now mentioned in the Material and Methods. Thus, however unsuited to analyze sexual dimorphism, our data set ensures that all the cell types we describe are qualitatively present in both sexes.

      Results line 3 - the celiac and mesenteric ganglia are prevertebral ganglia and not part of the sympathetic chain. The chain refers to the paravertebral ganglia.

      We replaced “part of the prevertebral chain” by “belonging to prevertebral ganglia”. This said, there are precedents for “prevertebral chain ganglia” to designate the rostro-caudal series of prevertebral ganglia. Rita Levi-Montalcini, for example, who devoted her glorious career to sympathetic ganglia, writes in 1972 “The nerve cell population of para- and prevertebral chain ganglia is reduced to 3–5% of that of controls”. (10.1016/0006-8993(72)90405-2).

      Page 3 - "as the current dogma implies". Dogma often refers to opinion or church doctrine. The current nomenclature is neither. Pick another word.

      There is little in science that is proven to the point of eliminating any element of opinion. “Dogma” refers to “that which is held as a principle or tenet […], especially a tenet authoritatively laid down by […] a school of thought” (OED). And “dogma” is used in science to designate tenets better experimentally supported than the “sacral parasympathetic”, such as the “central dogma of molecular biology”.

      Page 3 - "To give justice" implies the classical notion is unjust. How about, 'to further explore previous evidence indicating that ....'

      The term is indeed not proper English for the meaning intended, and the right expression is “to do justice”, to mean: “to treat [a subject or thing] in a manner showing due appreciation, to deal with [it] as is right or fitting” (OED). We have corrected the paper accordingly.

      Page 4 top - the convergence indicated by Figure 3 does not justify excluding cholinergic and noradrenergic genes from the analysis.

      Contrary to what the referee says, Figure 3 does not show any “convergence”, see our answer to Referee#1. What Figure 3 shows is that cells that are targeted by the lumbar pathway (a pathway universally deemed “sympathetic”) are cholinergic in massive proportion. Therefore, by an uncontroversial criterion, the pelvic ganglion contains lots of sympathetic cholinergic neurons. The only other option is to declare that sympathetic preganglionic neurons synapse onto parasympathetic postganglionic ones (which is what Referee#1 proposes, and considers “much simpler”. We beg to differ).

      Our justification for excluding cholinergic and noradrenergic genes from the definition of “sympathetic” and “parasympathetic” is simply that sympathetic neurons can be cholinergic (to sweat glands and periosteum; and — as we show in Figure 3 — many targets of the lumbar pathway); One can also note that anywhere else in the nervous system, classifying cell types as a function of neurotransmitter phenotype would lead to non-sensical descriptions, such as putting together pyramidal cells and cerebellar granules, or motor neurons and basal forebrain cholinergic neurons. Indeed Referee#1 proposes such a revolutionary revision, by calling all cholinergic autonomic neurons “parasympathetic” (see our answer above).

      Keast (1995) did similar experiments and used presynaptic lesions to draw a different conclusion indicating preferential innervation pelvic subpopulations.

      Keast found “preferential” innervation of pelvic subpopulations based on lesion experiments; Nevertheless, she concluded (at the time) that “the correct definition of these two components of the nervous system is based on neuroanatomy rather than chemistry” (Keast, 2006).

      Page 4 - "In the aggregate, the pelvic ganglion is best described as a divergent sympathetic ganglion devoid of parasympathetic neurons" The notion of a divergent ganglion is completely unclear!

      We take “divergent” in a developmental or evolutionary meaning: related to sympathetic ganglia, yet somewhat differing from them. Elsewhere we use the word “modified”. Importantly (and as cited in the paper), a similar situation emerges from the single cell transcriptomic analysis of the lumbar and sacral preganglionics (by other research groups).

      Granted, it is devoid of neurons having the signature of cranial parasympathetics, but that is insufficient to conclude that they are not parasympathetics.

      If a genetic signature which is not only un-parasympathetic, but sympathetic-like remains compatible with some version of the label “parasympathetic”, we get dangerously close to dismissing the molecular make-up of a neuron as a definition of its type. This goes against any contemporary understanding of neuron types (take (Zeisel et al., 2018) among hundreds of other examples).

      Page 4 - "the entire taxonomy of autonomic ganglia could be a developmental readout of Hox genes." This reader completely agrees! We appreciate this would be difficult to test but it helps to explain possible differences along the rostro-caudal axis. Consider making this a key implication of the study!

      If the reader agrees, then his/her previous points become mysterious: we speculate that the Hox code determines the structure of the autonomic nervous system, i.e. the array, along the rostrocaudal axis, of a bulbar parasympathetic, a thoracolumbar sympathetic and lumbo-sacral “pelvo-sympathetic”. The existence of caudal parasympathetic neurons, on the contrary, would subvert any role for Hox genes: similar neurons (similar enough to be called by the same name) would arise at completely different rostro-caudal levels, i.e. with a different Hox code.

      Page 5 - "It is thus remarkable ...that we uncover in no way contradicts the physiology." Not really. The 'classical' sympathetic system innervates the limbs, and the skin and it participates in thermoregulation and in cardiovascular adjustments to exercise. The parasympathetic system does none of these things. Reclassing the pelvic outflow as pseudo-sympathetic contradicts this physiology.

      We do not say that the sacral outflow is classically sympathetic; We go all the way to proposing the special name “pelvo-sympathetic”; And we insist that these special sympathetic-like neurons have special targets (detrusor muscle, helicine arteries…): there is no contradiction. Not only is there no contradiction, but we remove the mind-twister of an anatomical/genetic/cell type-based “sacral parasympathetic” combined with a lack of physiological lumbosacral antagonism (we provide a short history of this dissonance in (Espinosa-Medina et al., 2018)), which led Wilfrid Jänig to write (Jänig, 2006)(p. 357): “Thus, functions assumed to be primarily associated with sacral (parasympathetic) are well duplicated by thoracolumbar (sympathetic) pathways. This shows that the division of the spinal autonomic systems into sympathetic and parasympathetic with respect to sexual functions is questionable”. We could not agree more: this division is questionable in terms of physiology and inexistent in terms of cell types. In other words, we reconcile cell types with physiology (but “it is early days”).

      Answer to the novel recommendations by Referee #2

      In addition to my original comments, important anatomical and functional distinctions are not explained by the data in this paper. ANATOMY- Sympathetic ganglia are located in close proximity to major branches of the aorta. Cranial and sacral parasympathetic ganglia are located next to or within the structures they innervate (e.g. eye, lung, heart, bladder).

      The pelvic ganglion, including some of its cholinergic neurons, that the referee insist are parasympathetic, is further removed from one of its major targets (the helicine arteries of the external genitals) than the sympathetic prevertebral ganglia are of some of theirs (like the gut or kidney). We discussed this issue in (Espinosa-Medina et al., 2018).

      FUNCTION- The sympathetic system controls state variables (e.g. body temperature, blood pressure, serum electrolytes and fluid balance), parasympathetic neurons do not.

      Even in the classical view, the sympathetic system controls the blood vessels of the external genitals or the size of the pupil, for example, which are not state variables.

      […] The data in the paper are a useful next step in defining the genetic diversity of autonomic neurons but do not justify or improve upon existing nomenclature. The future challenge is to understand distinctions between subsets of autonomic ganglion cells that innervate different targets and the principles that govern the integrative function of the autonomic motor system that controls behavior.

      We thank the referee for finding our data useful; and we fully agree with the latter statement. However, neurons, like many other cell types, are hierarchically organized (Zeng and Sanes, 2017), i.e. subsets of neurons belong to sets, with defining traits. Our data argue that there is no parasympathetic neuronal set that includes any pelvic ganglionic neuron. In contrast, there is a ganglionic sympathetic set (defined by our analysis of gene expression) which includes all of them — as there is a preganglionic sympathetic set that includes sacral preganglionics (Alkaslasi et al., 2021; Blum et al., 2021)(although the direct comparison with cranial preganglionics is yet to be made).

      References

      Anderson, C. R., Bergner, A. and Murphy, S. M. (2006). How many types of cholinergic sympathetic neuron are there in the rat stellate ganglion? Neuroscience 140, 567–576.

      Alkaslasi, M. R., Piccus, Z. E., Hareendran, S., Silberberg, H., Chen, L., Zhang, Y., Petros, T. J. and Le Pichon, C. E. (2021). Single nucleus RNA-sequencing defines unexpected diversity of cholinergic neuron types in the adult mouse spinal cord. Nat Commun 12, 2471.

      Blum, J. A., Klemm, S., Shadrach, J. L., Guttenplan, K. A., Nakayama, L., Kathiria, A., Hoang, P. T., Gautier, O., Kaltschmidt, J. A., Greenleaf, W. J., et al. (2021). Single-cell transcriptomic analysis of the adult mouse spinal cord reveals molecular diversity of autonomic and skeletal motor neurons. Nat Neurosci 24, 572–583.

      Ernsberger, U. and Rohrer, H. (2018). Sympathetic tales: subdivisons of the autonomic nervous system and the impact of developmental studies. Neural Dev 13, 20.

      Espinosa-Medina I, Saha O, Boismoreau F, Chettouh Z, Rossi F, Richardson WD, Brunet JF (2016) The sacral autonomic outflow is sympathetic. Science 354, 893-897

      Espinosa-Medina, I., Saha, O., Boismoreau, F. and Brunet, J.-F. (2018). The “sacral parasympathetic”: ontogeny and anatomy of a myth. Clin Auton Res 28, 13–21.

      Furlan, A., La Manno, G., Lübke, M., Häring, M., Abdo, H., Hochgerner, H., Kupari, J., Usoskin, D., Airaksinen, M. S., Oliver, G., et al. (2016). Visceral motor neuron diversity delineates a cellular basis for nipple- and pilo-erection muscle control. 19, 1331–1340.

      Gaskell, W. H. (1886). On the Structure, Distribution and Function of the Nerves which innervate the Visceral and Vascular Systems. J Physiol 7, 1-80.9.

      Horn, J. P. (2018). The sacral autonomic outflow is parasympathetic: Langley got it right. Clin Auton Res 28, 181–185.

      Jänig, W. (2006). The Integrative Action of the Autonomic Nervous System: Neurobiology of Homeostasis. Cambridge: Cambridge University Press.

      Keast, J. R. (1995). Visualization and immunohistochemical characterization of sympathetic and parasympathetic neurons in the male rat major pelvic ganglion. Neuroscience 66, 655–662.

      Keast, J. R. (2006). Plasticity of pelvic autonomic ganglia and urogenital innervation. International Review of Cytology - a Survey of Cell Biology, Vol 248 248, 141-+.

      Langley, J. N. (1921). In The autonomic nervous system (Pt. I)., p. Cambridge: Heffer & Sons ltd.

      Langley, J. N. and Anderson, H. K. (1895). The Innervation of the Pelvic and adjoining Viscera: Part II. The Bladder. Part III. The External Generative Organs. Part IV. The Internal Generative Organs. Part V. Position of the Nerve Cells on the Course of the Efferent Nerve Fibres. J Physiol 19, 71–139.

      Lee, V. M., Sechrist, J. W., Luetolf, S. and Bronner-Fraser, M. (2003). Both neural crest and placode contribute to the ciliary ganglion and oculomotor nerve. Developmental biology 263, 176–190.

      Stanke, M., Duong, C. V., Pape, M., Geissen, M., Burbach, G., Deller, T., Gascan, H., Parlato, R., Schütz, G. and Rohrer, H. (2006). Target-dependent specification of the neurotransmitter phenotype:cholinergic differentiation of sympathetic neurons is mediated in vivo by gp130 signaling. Development 133, 141–150.

      Zeisel, A., Hochgerner, H., Lönnerberg, P., Johnsson, A., Memic, F., van der Zwan, J., Häring, M., Braun, E., Borm, L. E., La Manno, G., et al. (2018). Molecular Architecture of the Mouse Nervous System. Cell 174, 999-1014.e22.

      Zeng, H. and Sanes, J. R. (2017). Neuronal cell-type classification: challenges, opportunities and the path forward. Nat Rev Neurosci 18, 530–546.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the overwhelmingly positive summaries from all three reviewers and the eLife editorial team. All reviewers provided extremely detailed feedback regarding the initially submitted manuscript, we appreciate their efforts in helping us improve this manuscript. Below, are listed each of the specific comments made by the reviewers, and our responses to them in a point-by-point format.

      The only notable change made to the manuscript that was not in response to comments from a reviewer was regarding nomenclature of the structure that we had previously called the nuclear microtubule organising centre (MTOC). We had used the term MTOC to describe the entire structure, which spans the nuclear envelope and comprises an intranuclear portion and cytoplasmic extensions. Given recent evidence, including findings from this study, it is possible that both the intranuclear region and cytoplasmic extensions both have microtubule nucleating capacity, and therefore both meet the definition of an MTOC. To disambiguate this, we now refer to the overall structure as the centriolar plaque (CP), consistent with previous literature. The intranuclear portion of the CP will be referred to as the inner CP, while the cytoplasmic portion will be referred to as the outer CP.

      Reviewer #1 (Recommendations For The Authors):

      1) In the first part of the result section, a paragraph on sample processing for U-ExM could be added, with reference to Fig 1b.

      The following section has been added to the first paragraph of the results “…In this study all parasites were fixed in 4% paraformaldehyde (PFA), unless otherwise stated, and anchored overnight at 37 °C before gelation, denaturation at 95 °C and expansion. Expanded gels were measured, before shrinking in PBS, antibody staining, washing, re-expansion, and imaging (Figure 1b). Parasites were harvested at multiple time points during the intraerythrocytic asexual stage and imaged using Airyscan2 super-resolution microscopy, providing high-resolution three-dimensional imaging data (Figure 1c). A full summary of all target-specific stains used in this study can be found in Figure 1d.”

      2) The order of the figures could be changed for more consistency. For example, fig 2b is cited before 2a.

      An earlier reference to figure 2a was added to rectify this discrepancy.

      3) In Fig 2b it is difficult to distinguish the blue (nuclear) and green (plasma membrane) lines.x

      The thickness of these lines has been doubled.

      4) It is unclear what the authors want to show in Fig 2a.

      The intention of this figure, as with panel a of the majority of the organelle-specific figures in this manuscript, is simply to show what the target protein/structure looks like across intraerythrocytic development.

      5) Lines 154-155, the numbers of MTOC observed do not match those in Supplt Fig2c.

      This discrepancy has been addressed, the numbers in Supplementary Figure 2c were accurate so the text has been changed to reflect this.

      6) Line 188: the authors should explain the principle of C1 treatment.

      The following explanation of C1 treatment has been provided:

      “To ensure imaged parasites were fully segmented, we arrested parasite development by adding the reversible protein kinase G inhibitor Compound 1 (C1). This inhibitor arrests parasite maturation after the completion of segmentation but before egress. When C1 is washed out, parasites egress and invade normally, ensuring that observations made in C1-arrested parasites are physiologically relevant and not a developmental artefact due to arrest.”

      7) Lines 195-204: this part is rather difficult to follow as analysis of the basal complex is detailed later in the manuscript. The authors refer to Fig4 before describing Fig3.

      This has been clarified in the text.

      8) Lines 225 and 227, the authors cite Supplt Fig 2b about the Golgi, but probably meant Supplt Fig 4? In Supplt Fig 4, the authors could provide magnification in insets to better illustrate the Golgi-MTOC association.

      This should have been a reference to Supplementary Figure 2e instead of 2b, which has now been changed. In Supplementary Figure 4, zooms into a single region of Golgi have been provided to more clearly show its MTOC association.

      9) Supplt Fig8 is wrong (duplication of Supplt Fig6).

      We apologise for this mistake, the correct figure is now present in Supplementary Figure 8.

      10) Line 346: smV5 should be defined, and generation of the parasites should be described in the methods.

      This has now been defined, but we have not described the generation of the parasites, as this was performed in a previous study that we have referenced.

      11) Lines 361-362: "By the time the basal complex reaches its maximum diameter..." This sentence is not very clear, the authors could explain more precisely the sequence of events, indicating that the basal complex starts moving in the basal direction, as clearly illustrated in Fig 4a.

      This has been prefaced with the following sentence “…As the parasite undergoes segmentation, the basal complex expands and starts moving in the basal direction.”

      12) Supplt Fig6 comes after Supplt Fig9 in the narrative, and therefore could be placed after.

      Supplementary Figure 6 and 9 follow the order in which they are referred to in the text.

      13) Line 538: Supplt Fig9e instead of 9d.

      This has been fixed.

      14) Line 581: does the PFA-glutaraldehyde fixation allows visualizing other structures in addition to cytostome bulbs?

      While PFA-glutaraldehyde fixation allows visualisation of cytostome bulbs, to date we have not observed any other structure that stains/preserves better using NHS Ester or BODIPY Ceramide in PFA-glutaraldehyde fixed parasites. As a general trend, all structures other than cytostomes become somewhat more difficult to identify using NHS Ester or BODIPY Ceramide in PFA-glutaraldehyde fixed samples due to the local contrast with the red blood cell cytoplasm. It seems likely that this is just due to the preservation of RBC cytoplasm, and would be expected from any fixation method that doesn’t result in RBC lysis, rather than anything unique to glutaraldehyde.

      15) Line 652-653: It is unclear how the authors can hypothesize that rhoptries form de novo rather than splitting based on their observations.

      This not something we can say with certainty, we have however, introduced the following paragraph to qualify our claims: “Overall, we present three main observations suggesting that rhoptry pairs undergo sequential de novo biogenesis rather than dividing from a single precursor rhoptry. First, the tight correlation between rhoptry and MTOC cytoplasmic extension number suggests that either rhoptry division happens so fast that transition states are not observable with these methods or that each rhoptry forms de novo and such transition states do not exist. Second, the heterogeneity in rhoptry size throughout schizogony favors a model of de novo biogenesis given that it would be unusual for a single rhoptry to divide into two rhoptries of different sizes. Lastly, well-documented heterogeneity in rhoptry density suggests that, at least during early segmentation, rhoptries have different compositions. Heterogeneity in rhoptry contents would be difficult to achieve so quickly after biogenesis if they formed through fission of a precursor rhoptry.”

      16) Line 769: is expansion microscopy sample preparation compatible with FISH?

      Yes, there are publications of expansion being done with both MERFISH and FISH. Though it has not yet been applied to plasmodium. See examples: Wang, Guiping, Jeffrey R. Moffitt, and Xiaowei Zhuang. "Multiplexed imaging of high-density libraries of RNAs with MERFISH and expansion microscopy." Scientific reports 8.1 (2018): 4847. And Chen, Fei, et al. "Nanoscale imaging of RNA with expansion microscopy." Nature methods 13.8 (2016): 679-684.

      17) In the methods, the authors could provide details on the gel mounting step for imaging This is particularly important since this paper will likely serve as a reference standard for expansion microscopy in the field. Also, illustration that cryopreservation of gels does not modify the quality of the images would be useful.

      The following section has been added to our “image acquisition” paragraph: “Immediately before imaging, a small slice of gel ~10mm x ~10mm was cut and mounted on an imaging dish (35mm Cellvis coverslip bottomed dishes NC0409658 - FisherScientific) coated with Poly-D lysine. The side of the gel containing sample is placed face down on the coverslip and a few drops of ddH20 are added after mounting to prevent gel shrinkage due to dehydration during imaging.”

      We have decided not to illustrate that cryopreservation does not alter gel quality, as this is something that is already covered in the study that first cryopreserved gels, which is referenced in our methods section.

      Reviewer #2 (Recommendations For The Authors):

      1) Advantages and limitations of the expansion method are generally well discussed. The only matter in that respect that I was wondering is if expansion can always be assumed to be linear for all components of a cell. The hemozoin crystal does not expand (maybe not surprisingly), but could there also be other cellular structures that on a smaller scale separate or expand at a different rate than others? Is there any data on this from other organisms? I am raising this here not as a criticism of this work but if known to occur, it might need mentioning somewhere to alert the reader to it, particularly in regards to the many measurements in the paper (see also point 4). This might be a further factor contributing to the finding that the IMC and PPM could not be resolved.

      This is an excellent point and, to our knowledge, one that is currently still under investigation in the field. It is well-documented that expansion protocols need to be customized to each cell type and tissue they are applied to. Each solution used for fixation and anchoring as well as timing and temperature of denaturation can affect the expansion factor achieved as well as how isotropic/anisotropic the expanded structures turn out. However, we do not know of any examples where isotropic expansion was achieved for everything but an organelle or component of the cell. It is our impression that if the cell seems to have attained isotropic expansion, this is assumed to also be the case for the subcellular structures within it. Nonetheless, we think it remains a possibility to be considered specially as more structures are characterized using these methods. In the case of our IMC/PPM findings, when we performed calculations taking into account our experimental expansion factor as well as antibody effects, it was clear that the resolution of our microscope was not enough to resolve the two structures using our current labelling methods. So, we suspect most of the effect is driven by that. However, this still needs to be validated by attempting to resolve the two structures though alternative labelling and imaging methods.

      2) I understand that many things described in the results part are interconnected but still the level of hopping around between different figures/supp figures is considerable (see also point 6 on synchronicity of Figure parts). I do not have a simple fix, but maybe the authors could check if they could come up with a way to streamline parts of their results into a somewhat more reader friendly order.

      This has been a problem we encountered from the beginning and, after trying multiple presentations of the results and discussion, we realized they all have drawbacks. We eventually settled on this presentation as the “least confusing”. We agree, however, that the figure references and order could be better streamlined and have addressed this to the best of our ability.

      3) Are the authors sure the ER expands well and the BIP signal (Fig. S5) gives a signal reflecting the true shape of the ER? The signal in younger parasites seems rather extensive compared to what the ER (in my experience) typically looks like in these stages in live parasites.

      While there may be a discrepancy between how the presumably dynamic ER appears in live cells, and how it appears using BiP staining, we think it is unlikely this is a product of expansion. Additionally, if there were to be an artefactual change in the ER, it would be likely under-expansion rather than over-expansion, which to our knowledge has not been reported. In our opinion, the BiP staining we observe is comparable between unexpanded and expanded samples. We have included comparative images in Author response image 1 with DNA in cyan and BiP in yellow, unexpanded (left) and expanded (right) using the same microscope and BiP antibody.

      Author response image 1.

      4) It is nice to have measurements of the apicoplast and mitochondria, but given their size, this could also have been done in unexpanded, ideally live parasites, avoiding expansion and fixing artifacts. While the expansion has many nice features, measuring area of large structures may not be one where it is strictly needed. I am not saying this is not useful information, but maybe a note could be added to the manuscript that the conclusions on mitochondria and apicoplast area and division might be worth confirming in live parasites. A brief mention on similarities and differences to previous work analysing the shape and multiplication of these organelles through blood stage development (van Dooren et al MolMicrobiol2005) might also be useful.

      We agree with the reviewer that previous studies such as van Dooren et al. (2005) demonstrate that it is possible to track apicoplast and mitochondrial growth without expansion and share the opinion that live parasites are better for these measurements. Expansion only provides an advantage when more organelle-level resolution is needed. For example, in studying the association between these organelles and the MTOC or visualizing other branch-specific interactions.

      5) I could not find the Supp Fig. 8 on the IMC, the current Supp Fig. 8 is a duplication of Supp Fig. 6

      This has been addressed, Supplementary Figure 8 now refers to the IMC.

      6) Figure order is not very synchronous with the text: Fig. 2a is mentioned after Fig. 2b, Fig. 4b is mentioned first for Fig. 4 (Fig. 4a is not by itself mentioned) and before Fig. 3 is mentioned; Fig. 3b is before Fig. 3a.

      We have done our best to fix these discrepancies, but concede that we have not found a way to order these sections that doesn’t lead to some confusion.

      7) Fig. S2a, The label "Centrin" on left image is difficult to read

      We have increased the font size and changed colour slightly in the hope it is leigible.

      8) In Fig. 2a, the centrin foci are very focal and difficult to see in these images, particularly when printed out but also on screen. To a lesser extent this is also the case for CINCH in Fig. 4a (particularly when printed; when zoomed-in on screen, the signal is well visible). This issue of difficulties in seeing the fluorescence signal of some markers, particularly when printed out, applies also to other images of the paper.

      In the images of full size parasites, this is an issue that we cannot easily overcome as the fluorescent channels are already at maximum brightness without overexposure. To try and address this, we have provided zooms that we hope will more clearly show the fluorescence in these panels.

      9) Expand "C1" in line 188 (first use).

      This has been addressed in response to a previous comment.

      10) Line 227; does Supp Fig. 2b really show Golgi- cytoplasmic MTOC association?

      We have rephrased the wording of this section to clarify that we are observing proximity and not necessarily a physical tethering, however it is worth nothing that this was an accidental reference to Supplementary Figure 2b, and should’ve been Supplementary Figure 2e.

      11) Line 230, in segmented schizonts the Golgi was considered to be at the apical end. It might be more precise to call its location to be close to the nucleus on the side facing the apical end of the parasite. It seems to me it often tends to be closer to the nucleus (in line with its proximity to the ER, see also point 13).

      We have added more detail to this description clarifying that despite being at the apical end, the Golgi is closer to the nucleus.

      12) Supp Fig. S5: Is the top cell indeed a ring? In the second cell there seem to be two nuclei, I assume this is a double infection (please indicate this in the legend or use images of a single infection).

      In our opinion, the top cell in Supplementary Figure 5 is a ring. This is based on its size and its lack of an observable food vacuole (an area that lacks NHS ester staining). We typically showed images of ameoboid rings to avoid this ambiguity, but we think this parasite is a ring nonetheless. For the second image, this parasite is not doubly infected, as both DNA masses are actually contained within the same dumbbell shaped nuclear envelope. This parasite is likely undergoing its first anaphase (or the Plasmodium equivalent of anaphase) and will likely soon undergo its first nuclear division to separate these two DNA masses into individual nuclei.

      13) Line 244: I would not call the Golgi a part of the apical cluster of organelles. All secretory cargo originates from the ER-Golgi-transGolgi axis in a directional manner and this axis is connected to the nucleus by the perinuclear ER. If seen from a secretory pathway centred view, it is the other way around and you could call the apical organelles part of the nuclear periphery which would be equally non-ideal.

      Everything is close together in such a small cell. The secretory pathway likely is arranged in a serial manner starting from the perinuclear region to the transGolgi where cargo is sorted into vesicles for different destinations of which one is for the delivery of material to the apical organelles. The proposition that the Golgi is part of the apical cluster therefore somehow feels wrong, as the Golgi can still be considered to be upstream of the transGolgi before apical cargo branches off from other cargo destined for other destinations We agree with the reviewer that claiming a functional association between the Golgi and the apical organelles would be odd and we by no means meant to imply such functional grouping. Our intent was to confirm observations previously made about Golgi positioning by electron microscopy studies such as Bannister et al. (2000) at a larger spatial and temporal scale. These studies make the observation that the Golgi is spatially associated with the rhoptries at the apical end of the parasites. Logically, the Golgi is tied to the apical organelles through the secretory pathway as the reviewer suggests, but we claim no further relationship beyond that of organelle biogenesis. We have made modifications to the text to clarify these points.

      14) Lines 300 - 308 (and thereafter): I assume these were also expanded parasites and the microtubule length is given after correction for expansion. I would recommend to indicate in line 274 (when first explaining the expansion factor) that all following measurements in the text represent corrected measures or, if this is not always the case, indicate on each occasion. Is the expansion factor accurate and homogenous enough to draw firm conclusions (see also point 1)? Could it be a reason for the variation seen with SPMTs? Could a cellular reference be used as a surrogate to account for cell specific expansion or would you assume that cellular substructure specific expansion differences exist and prevent this?

      This is correct, the reported number is the number corrected for expansion factor, and the corresponding graphs with uncorrected data are present in the Supplementary Figures. We have clarified this in the text. Uneven expansion can be caused when certain organelles/structures do not properly denature. Given that out protocol denatures using highly concentrated SDS at 95 °C for 90 minutes, we do not anticipate that any subcellular compartments would expand significantly differently. In this study our expansion factors varied from ~4.1-4.7 across all gels, and for our corrected values we used the median expansion factor of 4.25. If we are interpreting the length of an interpolar spindle as 20 µm for example, the value would be corrected value would be 4.7 µm when divided by the median expansion factor, 4.9 µm when divided by the lowest, and 4.2 µm when divided by the highest. These values fall well within the measurement error, and so we expect that these small deviations in expansion factor between gels have a fairly minimal influence on variation in microtubule lengths.

      15) Line 353: this is non-essential, but a 3D view of the broken basal ring might better illustrate the 2 semicircles

      We have added the following panel to Supplementary Figure 3 to illustrate this more clearly:

      Author response image 2.

      16) The way the figure legends are shaped, it often seems only panel (a) is from expansion microscopy while the microscopy images in the other parts of the figures have no information on the method used. I assume all images are from expansion microscopy, maybe this could be clarified by placing this statement in a position of the legend that makes it clear it is for all images in a figure.

      This has been clarified in the figure legends.

      17) Fig. 8b, is it clear that internal RON4 is not below or above? Consider showing a 3D representation or side view of these max projections.

      If in these images, we imagine we are looking at the ‘top’ of the rhoptries, our feeling is that the RON4 signal is on the ‘bottom’, at the part closest to the apical polar ring. We tried projecting this, however, but the images were not particularly due to spherical aberrations. Because of this, we have refrained from commenting on the RON4 location relative to the rhoptry bulb prior to elongation.

      18) Line 684 "...distribution or RON4": replace or with of. The information of the next sentence is partly redundant, consider adding it in brackets.

      This has been addressed.

      19) Fig. 9a the EBA175 signal is not very prominent and a bit noisy, are the authors confident this is indeed showing only EBA175 or is there also some background?-AK

      We agree with the reviewer that the EBA175 antibody shows a significant amount of background fluorescence, specially in the food vacuole area. However, we think the puncta corresponding to micronemal EBA175 can be clearly distinguished from background.

      20) Fig. 9b, the long appearance of the micronemes in the z-dimension likely is due to axial stretch (due to point spread function in z and refractive index mismatch), in reality they probably are more spherical. It might be worth mentioning somewhere that this likely is not how these organelles are really shaped in that dimension (spherical fluorescent beads could give an estimation of that effect in the microscopy setup used).

      After recently acquiring a water-immersion objective lens for comparison, it is clear that the transition from oil to hydrogel causes a degree of spherical aberration in the Z-plane, which in this instance causes the micronemes to be more oblong. As we make no conclusions based on the shape of the micronemes, however, we don’t think this is a significant consideration. This is an assumption that should be made when looking at any image whose resolution is not equal in all 3-dimensions. We also note that the more spherical shape of micronemes can be inferred from the max intensity projections in Figure 9c.

      21) Fig. 9b, the authors mention in the text that there is NHS ester signal that overlaps with the fluorescence signal, can occasions of this be indicated in the figure?

      Figure 9b was already quite busy, so we instead added the following extra panel to this figure that more clearly shows the NHS punctae we thought may have been micronemes:

      Author response image 3.

      22) Fig. 9, line 695, the authors write that the EBA puncta were the same size as AMA1 puncta. To me it seems the AMA1 areas are larger than the EBA foci, is their size indeed similar? Was this measured?

      Since we did not conduct any measurements and doing so robustly would be difficult given the density of the puncta, we have decided to remove our comment on the relative size of the puncta.

      23) Materials and methods: Remove "to" in line 871; explain bicarb and incomplete medium in line 885 (non-malaria researchers will not understand what is meant here); line 911 and start of 912 seem somewhat redundant

      This has been addressed.

      24) Is there more information on what the Airyscan processing at moderate filter level does? The background of the images seems to have an intensity of 0 which in standard microscopy images should be avoided (see for instance doi:10.1242/jcs.03433) similar to the general standard of avoiding entirely white backgrounds on Western blots. I understand that some background subtraction processes will legitimately result in this but then it would be nice to know a bit better what happened to the original image.

      We have taken the following excerpt from a publication on Airyscan to help clarify:

      "Airyscan processing consists of deconvolution and pixel reassignment, which yield an image with higher resolution and reduced noise. This can be a contributor to the low background in some channels. The level of filtering is the processing strength, with higher filtering giving higher resolution but increased chances of artefacts. More information about the principles behind Airyscan processing can be found in the following two publications, though details on the algorithm itself seem to be proprietary: Huff, Joseph. "The Airyscan detector from ZEISS: confocal imaging with improved signal-to-noise ratio and super-resolution." (2015): i-ii. AND Wu, Xufeng, and John A. Hammer. "Zeiss airyscan: Optimizing usage for fast, gentle, super-resolution imaging." Confocal Microscopy: Methods and Protocols. New York, NY: Springer US, 2021. 111-130."

      We cannot find any further information about the specifics of Airyscan filtering, however, the moderate filter that we used is the default setting. This information was included just for clarity, rather than something we determined by comparison to other filtering settings.

      In regards to the background, the majority of some images having an intensity value of 0 is partially out of our control. For all NHS Ester images, the black point of the images was 0 so areas that lack signal (white in the case of NHS Ester) truly had no signal detected for those pixels. While we appreciate that never altering the black point of images displays 100% of the data in the image, images with any significant background can become impossibly difficult to interpret. We have done our best to try and present images where the black point is modified to remove background for ease of interpretation by the readers only.

      Reviewer #3 (Public Review):

      1) Most importantly, in order to justify the authors claim to provide an "Atlas", I want to strongly suggest they share their raw 3D-imaging data (at least of the main figures) in a data repository. This would allow the readers to browse their structure of interest in 3D and significantly improve the impact of their study in the malaria cell biology field.

      We agree completely that the potential impact of this study is magnified by public sharing of the data. The reason that this was not done at the time of submission is that most public repositories do not allow continued deposition of data, and so new images included in response to reviewers comments would’ve been separated from the initial submission, which we saw as needlessly complicated. All 647 images that underpin the results discussed in this manuscript are now publicly available in Dryad (https://doi.org/10.5061/dryad.9s4mw6mp4)

      2) The organization of the manuscript can be improved. Aside some obvious modifications as citing the figures in the correct order (see also further comments and recommendations), I would maybe suggest one subsection and one figure per analyzed cellular structure/organelle (i.e. 13 sections). This would in my opinion improve readability and facilitate "browsing the atlas".

      This is actually how we had originally formatted this manuscript, but this structure made discussing inter-connected organelles, such as the IMC and basal complex, impossibly difficult to navigate. We have done our best to make the manuscript flow better, but have not come up with any way to greatly restructure the manuscript so to increase its readability.

      3) Considering the importance of reliability of the U-ExM protocol for this study the authors should provide some validation for the isotropic expansion of the sample e.g. by measuring one well defined cellular structure.

      The protocol we used comes from the Bertiaux et al., 2021 PLoS Biology study. In this study they show isotropic expansion of blood-stage parasites.

      4) In the absence of time-resolved data and more in-depth mechanistic analysis the authors must down tone some of their conclusions specifically around mitochondrial membrane potential, subpellicular microtubule depolymerization, and kinetics of the basal complex.

      Our conclusions regarding mitochondrial membrane potential and basal complex kinetics have been dampened. We have not, however, changed our wording around microtubule depolymerisation. Partial depolymerisation of microtubules during fixation is a known phenomenon in Plasmodium, and in our opinion, our explanation of this offers a hypothesis that is balanced with respective to evidence: “we hypothesise that most SPMTs measured in our C1-treated schizonts had partially depolymerised. P. falciparum microtubules are known to rapidly depolymerise during fixation10,29. It is unclear, however, why this depolymerization was observed most often in C1-arrested parasites. Thus, we cannot determine whether these shorter microtubules are a by-product of drug-induced arrest or a biologically relevant native state that occurs at the end of segmentation.”

      5) The observation that the centriolar plaque extensions remains consistently tethered to the plasma membrane is of high significance. To more convincingly demonstrate this point, it would be very helpful to show one zoomed-in side view of nucleus with a mitotic spindle were both centriolar plaques are in contact with the plasma membrane.

      We of course agree that this is one of our most important observations, but in our opinion this is already demonstrated in Figure 2b. The third panel from the right shows a mitotic spindle and has the location of the cytoplasmic extensions, nuclear envelope and parasite plasma membranes annotated.

      6) Please verify the consistent use of the term trophozoite and schizont. In Fig. 1c a parasite with two nuclei, likely in the process of karyofission is designated as trophozoite, which contrasts with the mononucleated trophozoite shown in Fig. 1a. The reviewer is aware of the more "classical" description of the schizont as parasite with more than 2 nuclei, but based on the authors advanced knowledge of cell cycle progression and mitosis I would encourage them to make a clear distinction between parasites that have entered mitotic stages and pre-mitotic parasites (e.g. by applying the term schizont, and trophozoite, respectively).

      For this study, we have interpreted any parasite having three or more nuclei as being a schizont. We are aware this morphological interpretation is not universally held and indeed suboptimal for studying some aspects of parasite development, but all definitions of a schizont have some drawbacks. Whether a parasite has entered mitosis or not is obviously a hugely significant event in the context of cell biology, but in a mononucleated parasite this could only be determined using immunofluorescence microscopy with cell cycle or DNA replication markers.

      7) Aldolase does not localize diffusely in the cytoplasm in schizont stages as in contrast to earlier stage. The authors should comment on that.

      We are unclear if this is an interpretation of the images in supplementary figure 1, or inferred from other studies. If this is an interpretation of the images in Supplementary Figure 1, we do not agree that the images show a significant change in the localisation of aldolase. It is possible that this difference in interpretation comes from the strong punctate signal observed more readily in the schizont images. This is the strong background signal in or around the food vacuole we mention in the text. These punctae are significantly brighter than the cytosolic aldolase signal, making it difficult to see them on the aldolase only channel, but aldolase signal can clearly be seen in the cytoplasm on the merge images.

      8) Line 79. Uranyl acetate is just one of the contrasting agents used in electron microscopy. The authors might reformulate this statement. Possibly this would also be a good opportunity to briefly mention that electron density measured in EM and protein-density labeled by NHS-Ester can be similar but are not equivalent.

      We have expanded on this in the text.

      9) The authors claim that they investigate the association between the MTOC and the APR (line 194), but strictly speaking only look at subpellicular microtubules and an associated protein density. The argument that there is a "NHS ester-dense focus" (line 210) without actual APR marker is not quite convincing enough to definitively designate this as the APR.

      While an APR marker would of course be very useful, there are currently no published examples of APR markers in blood-stage parasites. We therefore think that the timing of appearance, location, and staining density are sufficient for identifying this structure as the APR, as it has previously been designated through EM studies. We have nonetheless softened our language around APR-related observations.

      10) Line 226: The authors should also discuss the organization of the Golgi in early schizonts (Fig. S4). (not only 2 nuclei and segmenter stages).

      We did not mean to imply that all 22 parasites had only 2 nuclei, but instead that they had 2 or more nuclei. Therefore, early schizonts are included in this analysis, with Golgi closely associated with all their MTOCs.

      11) Line 242: To the knowledge of the reviewer the nuclear pore complexes, although clustered in merozoites and ring stages, don't particularly "define the apical end of the parasite".

      The MTOC is surrounded by NPCs, which because of the location of the MTOC end up being near the forming apical end of the merozoite, but we have removed this as it was needlessly confusing.

      12) Supplementary Figure 8 is missing (it's a repetition of Fig. S6).

      This has been addressed.

      13) Line 253: asexual blood stage parasites have two classes of MTs. Other stages can have more.

      This has been clarified.

      14) Fig. 3f: Please comment how much of these observations of "only one" SPMT could result from suboptimal resolution (e.g. in z-direction) or labeling. Otherwise use line profiles to argue that you can always safely distinguish SPMT pairs.

      In the small number of electron tomograms of merozoites where the subpellicular microtubules have been rendered, they have been seen to have 2 or 3 SPMTs. Despite this, we don’t think it is likely that the single SPMT merozoites observed in this study are caused by a resolution limitation. SPMTs were measured in 3D, rather than from projections, and any schizont where the SPMTs were pointing towards the objective lens, elongating the parasite in Z, were not imaged. Additionally, our number of merozoites with a single SPMT correspond with the same data collected in the Bertiaux et al., 2021 PLoS Biology study. We cannot rule this out as a possibility, as sometimes SPMTs cross over each other in three-dimensions, and at these intersection points they cannot be individually resolved. We, however, think it is very unlikely that two SPMTs would be so close that they can never be resolved across any part of their length.

      15) Lines 302ff: the claim that variability in SPMT size must be a consequence of depolymerzation is unfounded. The dynamics of SPMT are unknown at this point. Similarly unfounded is the definitive claim that it is known that P.f. MTs depolymerize upon fixation. Other possibilities should be considered. SPMT could also simply shorten in C1-arrested parasites.

      While we agree with the reviewer that much about SPMT dynamics in schizonts remains unknown, we disagree with the claim that our consideration of SPMT depolymerization as a possible explanation for our observations is unfounded. Microtubule depolymerization is a well-known fixation and sample preparation artefact in both mammalian cells and a well-documented phenomenon in Plasmodium when parasites are washed with PBS prior to fixation. We convey in the text our belief that it is possible that SPMTs shorten in C1-arrested parasites as a result of drug treatment. However, it is our opinion that there simply is not enough evidence at this moment to conclusively pinpoint the cause of our observed depolymerization. As we mention in the text, further experiments are needed in order to determine with confidence whether depolymerization is a consequence of our fixation protocol, a consequence of C1 treatment (or the length of that treatment), or a biological phenomenon resulting from parasite maturation.

      16) Line 324: "up to 30 daughter merozoites"

      Schizonts can have more than 30 daughter merozoites, so we have not altered this statement.

      17) Figure 4b. Line 354 The postulated breaking in two is not well visible and here the authors should attempt a more conservative interpretation of the data (especially with respect to those early basal complex dynamics).

      We think that the basal complex dividing or breaking in two is the more conservative interpretation of our data. There is no evidence to suggest that a second basal complex is formed de novo and, while never before described using a basal complex protein, the cramp-like structure and dynamics we observe are consistent with that observed in early IMC proteins. We have updated the text to provide additional context and make the reasoning behind our hypothesis clearer.

      18) Line 365: Commenting on their relative size would require a quantification of APR and basal complex size (can be provided in the text).

      We are unsure what this is in reference to, as there is no mention of the APR in the basal complex section.

      19) Lines 375ff: The claim that NHS Ester is a basal complex marker should be mitigated or more convincing images without the context of anti-CINCH staining being sufficient to identify the ring structure should be presented.

      We have provided high quality, zoomed-in images without anti-CINCH staining in Fig. 5D&E, 6C, 7b, and Supplementary Fig. 8 that show that even in the absence of a basal complex antibody, the basal complex still stains densely by NHS ester.

      20) Line 407: The claim that there are differences in membrane potential along the mitochondria needs to be significantly mitigated. There are several alternative explanations of this staining pattern (some of which the authors name themselves). Differences in local compartment volume, differences in membrane surface, diffusibility/leakage of the dye can definitively play a role in addition to fixation and staining artefacts (also brought forward recently for U-ExM by Laporte et al. 2022 Nat Meth). Confirming the hypothesis of the authors would need significantly more experimental evidence that is outside the scope of this study.

      We have significantly dampened and qualified the wording in this section. It now reads: “These clustered areas of Mitotracker staining were highly heterogeneous in size and pattern. Small staining discontinuities like these are commonly observed in mammalian cells when using Mitotracker dyes due to the heterogeneity of membrane potential from cristae to cristae as well as due to fixation artifacts. At this point, we cannot determine whether the staining we observed represents a true biological phenomenon or an artefact of this sample preparation approach. Our observed Mitotracker-enriched pockets could be an artifact of PFA fixation, a product of local membrane depolarization, a consequence of heterogeneous dye retention, or a product of irregular compartments of high membrane potential within the mitochondrion, to mention a few possibilities. Further research is needed to conclusively pinpoint an explanation.”

      21) Fig. 7e: The differences in morphology using different fixation methods are interesting. Can the authors provide a co-staining of K13-GFP together with the better-preserved structures in the GA-containing fixation protocol to demonstrate that these are indeed cytostome bulbs?

      Figure 7 has been changed substantially to show more clearly the preservation of the red blood cell membrane following PFA-GA fixation, followed by direct comparison of K13-GFP stained parasites fixed in either PFA only or PFA-GA. The cytostome section of the results has also changed to reflect this, the changed section now reads:

      “PFA-glutaraldehyde fixation allows visualization of cytostome bulb The cytostome can be divided into two main components: the collar, a protein dense ring at the parasite plasma membrane where K13 is located, and the bulb, a membrane invagination containing red blood cell cytoplasm {Milani, 2015 #63;Xie, 2020 #62}.While we could identify the cytostomal collar by K13 staining, these cytostomal collars were not attached to a membranous invagination. Fixation using 4% v/v paraformaldehyde (PFA) is known to result in the permeabilization of the RBC membrane and loss of its cytoplasmic contents65. Topologically, the cytostome is contiguous with the RBC cytoplasm and so we hypothesised that PFA fixation was resulting in the loss of cytostomal contents and obscuring of the bulb. PFA-glutaraldehyde fixation has been shown to better preserve the RBC cytoplasm65. Comparing PFA only with PFA-glutaraldehyde fixed parasites, we could clearly observe that the addition of glutaraldehyde preserves both the RBC membrane and RBC cytoplasmic contents (Figure 7c). Further, while only cytostomal collars could be observed with PFA only fixation, large membrane invaginations (cytostomal bulbs) were observed with PFA-glutaraldehyde fixation (Figure 7d). Cytostomal bulbs were often much longer and more elaborate spreading through much of the parasite (Supplementary Video 1), but these images are visually complex and difficult to project so images displayed in Figure 7 show relatively smaller cytostomal bulbs. Collectively, this data supports the hypothesis that these NHS-ester-dense rings are indeed cytostomes and that endocytosis can be studied using U-ExM, but PFA-glutaraldehyde fixation is required to maintain cytostome bulb integrity.”

      22) It would be helpful to the readers to indicate in the schematic in Fig. 1b at which point NHS-Ester staining is implemented.

      Figure 1b is slightly simplified in the sense that it doesn’t differentiate primary and secondary antibody staining, but we have updated it to reflect that antibody and dye staining are concurrent, rather than separate.

      23) In Fig. 2B the second panel from the right the nuclear envelope boundary does not seem to be accurately draw as it includes the centrin signal of the centriolar plaque.

      Thank you for pointing this out, it has now been redrawn.

      24) Line 44-45: should read "up to 30 new daughter merozoites" (include citations).

      We have included a citation here, but left it as approximately 30 daughter merozoites as the study found multiple cells with >30 daughter merozoites.

      25) Line 49: considering its discovery in 2015 the statement that it has gained popularity in the last decade can probably be omitted.

      This has been removed.

      26) Fig S1 should probably read "2N" (instead of "2n"). Or alternatively "2C" could be fine.

      27) Line 154: To help comprehension please define the term "branch number" in this context when it comes up.

      A definition for branch has now been provided.

      28) Fig. S5: To my estimation it is not an "early trophozoite", which is depicted.

      While this parasite technically fits our definition of trophozoite, as it has not yet undergone nuclear division, we have swapped it for a visibly earlier parasite for clarity. This is the new parasite depicted

      Author response image 4.

      29) Fig. 2a is not referenced before Fig. 2b in the text.

      This has been addressed.

      30) I could not find the reference to Fig. S2e and its discussion.

      It was wrongly labelled as Supplementary Figure 2b in the text, this has now been addressed.

      31) The next Figure referenced in the text after Fig. 2b is Fig. 4b. Fig.3 is only referenced and discussed later, which was quite confusing.

      The numbering discrepancies have been addressed.

      32) Line 196: Figure reference is missing.

      This data did not have a figure reference, but the numbers have now been provided in-text.

      33) Fig. 3c: Is "Branches per MTOC" not just total branches divided by two? If so it can be omitted. If not so please explain the difference.

      Yes it was total branched divided by two, this has been removed from Figure 3c.

      34) Figure 5c and 6d: The authors should show examples of the image segmentation used to calculate the surface area.

      Surface area calculation was done in an essentially one step process. From maximum intensity projections, free-hand regions of interest were drawn, from which ZEN automatically calculates their area. Example as Author response image 5:

      Author response image 5.

      35) Figure 7b should also show the NHS Ester staining alone for the zoom in.

      We have included the NHS ester staining alone on the zoom on, but we have slightly changed the presentation of these two panels to show both the basal complex and cytostomes as follows:

      Author response image 6.

      36) To which degree are Rhoptry necks associated with MTOC extensions?

      This cannot easily be determined with the images we have so far. Before elongated necks are visible, the RON4 signal does appear pointed towards the MTOC extensions. Rhoptry necks don’t seem to elongate until segmentation, when the MTOC starts to move away from the apical end of the parasite. So it is possible there is a transient association, but we cannot easily discern this from our data.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Genes expressed in the same direction in lowland individuals facing hypoxia (the plastic state) as what is found in the colonised state are defined as adaptative, while genes with the opposite expression pattern were labelled as maladaptive, using the assumption that the colonised state must represent the result of natural selection. Furthermore, genes could be classified as representing reversion plasticity when the expression pattern differed between the plasticity and colonised states and as reinforcement when they were in the same direction (for example more expressed in the plastic state and the colonised state than in the ancestral state). They found that more genes had a plastic expression pattern that was labelled as maladaptive than adaptive. Therefore, some of the genes have an expression pattern in accordance with what would be predicted based on the plasticity-first hypothesis, while others do not.

      Thank you for a precise summary of our work. We appreciate the very encouraging comments recognizing the value of our work. We have addressed concerns from the reviewer in greater detail below.

      Q1. As pointed out by the authors themselves, the fact that temperature was not included as a variable, which would make the experimental design much more complex, misses the opportunity to more accurately reflect the environmental conditions that the colonizer individuals face at high altitude. Also pointed out by the authors, the acclimation experiment in hypoxia lasted 4 weeks. It is possible that longer term effects would be identifiable in gene expression in the lowland individuals facing hypoxia on a longer time scale. Furthermore, a sample size of 3 or 4 individuals per group depending on the tissue for wild individuals may miss some of the natural variation present in these populations. Stating that they have a n=7 for the plastic stage and n= 14 for the ancestral and colonized stages refers to the total number of tissue samples and not the number of individuals, according to supplementary table 1.

      We shared the same concerns as the reviewer. This is partly because it is quite challenging to bring wild birds into captivity to conduct the hypoxia acclimation experiments. We had to work hard to perform acclimation experiments by taking lowland sparrows in a hypoxic condition for a month. We indeed have recognized the similar set of limitations as the review pointed out and have discussed the limitations in the study, i.e., considering hypoxic condition alone, short time acclimation period, etc. Regarding sample sizes, we have collected cardiac muscle from nine individuals (three individuals for each stage) and flight muscle from 12 individuals (four individuals for each stage). We have clarified this in Supplementary Table 1.

      Q2. Finally, I could not find a statement indicating that the lowland individuals placed in hypoxia (plastic stage) were from the same population as the lowland individuals for which transcriptomic data was already available, used as the "ancestral state" group (which themselves seem to come from 3 populations Qinghuangdao, Beijing, and Tianjin, according to supplementary table 2) nor if they were sampled in the same time of year (pre reproduction, during breeding, after, or if they were juveniles, proportion of males or females, etc). These two aspects could affect both gene expression (through neutral or adaptive genetic variation among lowland populations that can affect gene expression, or environmental effects other than hypoxia that differ in these populations' environments or because of their sexes or age). This could potentially also affect the FST analysis done by the authors, which they use to claim that strong selective pressure acted on the expression level of some of the genes in the colonised group.

      The reviewer asked how individual tree sparrows used in the transcriptomic analyses were collected. The individuals used for the hypoxia acclimation experiment and represented the ancestral lowland population were collected from the same locality (Beijing) and at the same season (i.e., pre-breeding) of the year. They are all adults and weight approximately 18g. We have clarified this in the Supplementary Table S1 and Methods. We did not distinguish males from females (both sexes look similar) under the assumption that both sexes respond similarly to hypoxia acclimation in their cardiac and flight muscle gene expression.

      The Supplementary Table 2 lists the individuals that were used for sequence analyses. These individuals were only used for sequence comparisons but not for the transcriptomic analyses. The population genetic structure analyzed in a previously published study showed that there is no clear genetic divergence within the lowland population (i.e., individuals collected from Beijing, Tianjing and Qinhuangdao) or the highland population (i.e., Gangcha and Qinghai Lake). In addition, there was no clear genetic divergence between the highland and lowland populations (Qu et al. 2020).

      Author response image 1.

      Figure 1. Population genetic structure of the Eurasian Tree Sparrow (Passer montanus). The genetic structure generated using FRAPPE. The colors in each column represent the contribution from each subcluster (Qu et al. 2020). Yellow, highland population; blue, lowland population.

      Q4. Impact of the work There has been work showing that populations adapted to high altitude environments show changes in their hypoxia response that differs from the short-term acclimation response of lowland population of the same species. For example, in humans, see Erzurum et al. 2007 and Peng et al. 2017, where they show that the hypoxia response cascade, which starts with the gene HIF (Hypoxia-Inducible Factor) and includes the EPO gene, which codes for erythropoietin, which in turns activates the production of red blood cell, is LESS activated in high altitude individuals compared to the activation level in lowland individuals (which gives it its name). The present work adds to this body of knowledge showing that the short-term response to hypoxia and the long term one can affect different pathways and that acclimation/plasticity does not always predict what physiological traits will evolve in populations that colonize these environments over many generations and additional selection pressure (UV exposure, temperature, nutrient availability). Altogether, this work provides new information on the evolution of reaction norms of genes associated with the physiological response to one of the main environmental variables that affects almost all animals, oxygen availability. It also provides an interesting model system to study this type of question further in a natural population of homeotherms.

      Erzurum, S. C., S. Ghosh, A. J. Janocha, W. Xu, S. Bauer, N. S. Bryan, J. Tejero et al. "Higher blood flow and circulating NO products offset high-altitude hypoxia among Tibetans." Proceedings of the National Academy of Sciences 104, no. 45 (2007): 17593-17598. Peng, Y., C. Cui, Y. He, Ouzhuluobu, H. Zhang, D. Yang, Q. Zhang, Bianbazhuoma, L. Yang, Y. He, et al. 2017. Down-regulation of EPAS1 transcription and genetic adaptation of Tibetans to high-altitude hypoxia. Molecular biology and evolution 34:818-830.

      Thank you for highlighting the potential novelty of our work in light of the big field. We found it very interesting to discuss our results (from a bird species) together with similar findings from humans. In the revised version of manuscript, we have discussed short-term acclimation response and long-term adaptive evolution to a high-elevation environment, as well as how our work provides understanding of the relative roles of short-term plasticity and long-term adaptation. We appreciate the two important work pointed out by the reviewer and we have also cited them in the revised version of manuscript.

      Reviewer #2 (Public Review):

      This is a well-written paper using gene expression in tree sparrow as model traits to distinguish between genetic effects that either reinforce or reverse initial plastic response to environmental changes. Tree sparrow tissues (cardiac and flight muscle) collected in lowland populations subject to hypoxia treatment were profiled for gene expression and compared with previously collected data in 1) highland birds; 2) lowland birds under normal condition to test for differences in directions of changes between initial plastic response and subsequent colonized response. The question is an important and interesting one but I have several major concerns on experimental design and interpretations.

      Thank you for a precise summary of our work and constructive comments to improve this study. We have addressed your concerns in greater detail below.

      Q1. The datasets consist of two sources of data. The hypoxia treated birds collected from the current study and highland and lowland birds in their respective native environment from a previous study. This creates a complete confounding between the hypoxia treatment and experimental batches that it is impossible to draw any conclusions. The sample size is relatively small. Basically correlation among tens of thousands of genes was computed based on merely 12 or 9 samples.

      We appreciate the critical comments from the reviewer. The reviewer raised the concerns about the batch effect from birds collected from the previous study and this study. There is an important detail we didn’t describe in the previous version. All tissues from hypoxia acclimated birds and highland and lowland birds have been collected at the same time (i.e., Qu et al. 2020). RNA library construction and sequencing of these samples were also conducted at the same time, although only the transcriptomic data of lowland and highland tree sparrows were included in Qu et al. (2020). The data from acclimated birds have not been published before.

      In the revised version of manuscript, we also compared log-transformed transcript per million (TPM) across all genes and determined the most conserved genes (i.e., coefficient of variance ≤  0.3 and average TPM ≥ 1 for each sample) for the flight and cardiac muscles, respectively (Hao et al. 2023). We compared the median expression levels of these conserved genes and found no difference among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05). As these results suggested little batch effect on the transcriptomic data, we used TPM values to calculate gene expression level and intensity. This methodological detail has been further clarified in the Methods and we also provided a new supplementary Figure (Figure S5) to show the comparative results.

      Author response image 2.

      The median expression levels of the conserved genes (i.e., coefficient of variance ≤ 0.3 and average TPM ≥ 1 for each sample) did not differ among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05).

      The reviewer also raised the issue of sample size. We certainly would have liked to have more individuals in the study, but this was not possible due to the logistical problem of keeping wild bird in a common garden experiment for a long time. We have acknowledged this in the manuscript. In order to mitigate this we have tested the hypothesis of plasticity following by genetic change using two different tissues (cardiac and flight muscles) and two different datasets (co-expressed gene-set and muscle-associated gene-set). As all these analyses show similar results, they indicate that the main conclusion drawn from this study is robust.

      Q2. Genes are classified into two classes (reversion and reinforcement) based on arbitrarily chosen thresholds. More "reversion" genes are found and this was taken as evidence reversal is more prominent. However, a trivial explanation is that genes must be expressed within a certain range and those plastic changes simply have more space to reverse direction rather than having any biological reason to do so.

      Thank you for the critical comments. There are two questions raised we should like to address them separately. The first concern centered on the issue of arbitrarily chosen thresholds. In our manuscript, we used a range of thresholds, i.e., 50%, 100%, 150% and 200% of change in the gene expression levels of the ancestral lowland tree sparrow to detect genes with reinforcement and reversion plasticity. By this design we wanted to explore the magnitudes of gene expression plasticity (i.e., Ho & Zhang 2018), and whether strength of selection (i.e., genetic variation) changes with the magnitude of gene expression plasticity (i.e., Campbell-Staton et al. 2021).

      As the reviewer pointed out, we have now realized that this threshold selection is arbitrarily. We have thus implemented two other categorization schemes to test the robustness of the observation of unequal proportions of genes with reinforcement and reversion plasticity. Specifically, we used a parametric bootstrap procedure as described in Ho & Zhang (2019), which aimed to identify genes resulting from genuine differences rather than random sampling errors. Bootstrap results suggested that genes exhibiting reversing plasticity significantly outnumber those exhibiting reinforcing plasticity, suggesting that our inference of an excess of genes with reversion plasticity is robust to random sampling errors. We have added these analyses to the revised version of manuscript, and provided results in the Figure 2d and Figure 3d.

      Author response image 3.

      Figure 2a (left) and Figure 2b (right). Frequencies of genes with reinforcement and reversion plasticity (>50%) and their subsets that acquire strong support in the parametric bootstrap analyses (≥ 950/1000).

      In addition, we adapted a bin scheme (i.e., 20%, 40% and 60% bin settings along the spectrum of the reinforcement/reversion plasticity). These analyses based on different categorization schemes revealed similar results, and suggested that our inference of an excess of genes with reversion plasticity is robust. We have provided these results in the Supplementary Figure S2 and S4.

      Author response image 4.

      (A) and Figure S4 (B). Frequencies of genes with reinforcement and reversion plasticity in the flight and cardiac muscle. (A) For genes identified by WGCNA, all comparisons show that there are more genes showing reversion plasticity than those showing reinforcement plasticity for both the flight and cardiac msucles. (B) For genes that associated with muscle phentoypes, all comparisons show that there are more genes showing reversion plasticity than those showing reinforcement plasticity for the flight muscle, while more than 50% of comparisons support an excess of genes with reversion plasticity for the cardiac muscle. Two-tailed binomial test, NS, non-significant; , P < 0.05; , P < 0.01; **, P < 0.001.

      The second issue that the reviewer raised is that the plastic changes simply have more space to reverse direction rather than having any biological reason to do so. While a causal reason why there are more genes with expression levels being reversed than those with expression levels being reinforced at the late stages is still contentious, increasingly many studies show that genes expression plasticity at the early stage may be functionally maladapted to novel environment that the species have recently colonized (i.e., lizard, Campbell-Staton et al. 2021; Escherichia coli, yeast, guppies, chickens and babblers, Ho and Zhang 2018; Ho et al. 2020; Kuo et al. 2023). Our comparisons based on the two genesets that are associated with muscle phenotypes corroborated with these previous studies and showed that initial gene expression plasticity may be nonadaptive to the novel environments (i.e., Ghalambor et al. 2015; Ho & Zhang 2018; Ho et al. 2020; Kuo et al. 2023; Campbell-Staton et al. 2021).

      Q3. The correlation between plastic change and evolved divergence is an artifact due to the definitions of adaptive versus maladaptive changes. For example, the definition of adaptive changes requires that plastic change and evolved divergence are in the same direction (Figure 3a), so the positive correlation was a result of this selection (Figure 3d).

      The reviewer raised an issue that the correlation between plastic change and evolved divergence is an artifact because of the definition of adaptive versus maladaptive changes, for example, Figure 3d. We agree with the reviewer that the correlation analysis is circular because the definition of adaptive and maladaptive plasticity depends on the direction of plastic change matched or opposed that of the colonized tree sparrows. We have thus removed previous Figure 3d-e and related texts from the revised version of manuscript. Meanwhile, we have changed Figure 3a to further clarify the schematic framework.

    1. Author Response:

      Reviewer #1 (Public Review):

      Despite numerous studies on quinidine therapies for epilepsies associated with GOF mutant variants of Slack, there is no consensus on its utility due to contradictory results. In this study Yuan et al. investigated the role of different sodium selective ion channels on the sensitization of Slack to quinidine block. The study employed electrophysiological approaches, FRET studies, genetically modified proteins and biochemistry to demonstrate that Nav1.6 N- and C-tail interacts with Slack's C-terminus and significantly increases Slack sensitivity to quinidine blockade in vitro and in vivo. This finding inspired the authors to investigate whether they could rescue Slack GOF mutant variants by simply disrupting the interaction between Slack and Nav1.6. They find that the isolated C-terminus of Slack can reduce the current amplitude of Slack GOF mutant variants co-expressed with Nav1.6 in HEK cells and prevent Slack induced seizures in mouse models of epilepsy. This study adds to the growing list of channels that are modulated by protein-protein interactions, and is of great value for future therapeutic strategies.

      I have a few comments with regard to how Nav1.6 sensitize Slack to block by quinidine.

      (1) It is not clear to me if the Slack induced current amplitude varies depending on the specific Nav subtype. To this end, it would be valuable to test if Slack open probability is affected by the presence of specific Nav subtypes. Nav induced differences in Slack current amplitude and open probability could explain why individual Nav subtypes show varied ability to sensitize Slack to quinidine blockade.

      We appreciate the reviewer for raising this point. In order to address whether the whole-cell current amplitudes of Slack varies depending on the specific NaV subtype, we examined Slack current amplitudes upon co-expression of Slack with specific NaV subtypes in HEK293 cells. The results have shown that there are no significant differences in Slack current amplitudes upon co-expression of Slack with different NaV channel subtypes (Author response image 1), suggesting whole-cell Slack current amplitudes cannot explain the varied ability of NaV subtypes to sensitize Slack to quinidine blockade. To investigate the effect of different NaV channel subtypes on Slack open probability, we will perform the single-channel recordings in the future studies.

      Author response image 1.

      The amplitudes of Slack currents upon co-expression of Slack with specific NaV subtypes in HEK293 cells. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      (2) It has previously been shown that INaP (persistent sodium current) is important for inducing Slack currents. Here the authors show that INaT (transient sodium current) of Nav1.6 is necessary for the sensitization of Slack to quinidine block whereas INaP surprisingly has no effect. The authors then show that the N-tail together with C-tail of Nav1.6 can induce same effect on Slack as full-length Nav1.6 in presence of high intracellular concentrations of sodium. However, it is not clear to me how the isolated N- and C-tail of Nav1.6 can induce sensitization of Slack to quinidine by interacting with C-terminus of Slack, while sensitization also is dependant on INaT. The authors speculate on different slack open conformation, but one could speculate if there is a missing link, such as an un-identified additional interacting protein that causes the coupling.

      We fully agree the importance of investigating the detailed mechanism underlying the sensitization of Slack to quinidine blockade mediated by the N- and C-termini of NaV1.6. Regarding the possibility of additional interacting proteins (“missing link”) that mediate the coupling between Slack and NaV1.6, our GST-pull down assays involving Slack and the N- and C-termini of NaV1.6 (Fig. S7) suggest a direct interaction between Slack and NaV1.6 channels. This finding leads us to consider the possibility of additional interacting proteins might be excluded. In order to further address these questions, we plan to employ structural biological methods, such as cryo-electron microscopy (cryo-EM).

      Reviewer #2 (Public Review):

      This is a very interesting paper about the coupling of Slack and Nav1.6 and the insight this brings to the effects of quinidine to treat some epilepsy syndromes.

      Slack is a sodium-activated potassium channel that is important to hyperpolarization of neurons after an action potential. Slack is encoded by KNCT1 which has mutations in some epilepsy syndromes. These types of epilepsy are treated with quinidine but this is an atypical antiseizure drug, not used for other types of epilepsy. For sufficient sodium to activate Slack, Slack needs to be close to a channel that allows robust sodium entry, like Na channels or AMPA receptors. but more mechanistic information is not available. Of particular interest to the authors is what allows quinidine to be effective in reducing Slack.

      In the manuscript, the authors show that Nav, not AMPA receptors are responsible for Slack activation, at least in cultured neurons (HeK293, primary cortical neurons). Most of the paper focuses on the evidence that Nav1.6 promotes Slack sensitivity to quinidine.

      (1) The paper is very well written although there are reservations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system.

      We appreciate the reviewer's positive evaluation of our work. We acknowledge that utilizing a more intact system would provide valuable insights into the inhibitory effect of quinidine on Slack-NaV1.6. However, there are certain challenges associated with studying Slack currents in their entirety.

      First, in our experiments, isolating Slack currents from Na+-activated K+ currents in an intact system is challenging as selective inhibitors for Slick are currently unavailable. To address this, we propose using Slick gene knockout mice to specifically measure Slack currents under physiological conditions in the future investigations. Second, we have observed that the interaction between Slack and NaV1.6 primarily occurs at the axon initial segment of neurons. This poses a difficulty when using brain slices for measurements, as employing the whole-cell voltage-clamp technique to assess Slack at the axon initial segment may introduce systemic errors.

      We believe that testing the pharmacological effects of quinidine on Slack-NaV1.6 in primary neurons remains the optimal approach. Although non-neuronal cells or cultured primary neurons may not fully replicate the complexity of an intact system, they still provide valuable insights into the interactions between Slack and NaV1.6, and the effects of quinidine.

      (2) I also have questions about the figures.

      We will make the necessary modifications and clarifications based on the reviewer's comments:

      (3) Finally, riluzole is not a selective drug, so the limitations of this drug should be discussed.

      We thank the reviewer for raising this point. We will discuss the limitations of riluzole in our revised version of the manuscript.

      (4) On a minor point, the authors use the term in vivo but there are no in vivo experiments.

      We thanks the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the co-immunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      Reviewer #3 (Public Review):

      Yuan et al., set out to examine the role of functional and structural interaction between Slack and NaVs on the Slack sensitivity to quinidine. Through pharmacological and genetic means they identify NaV1.6 as the privileged NaV isoform in sensitizing Slack to quinidine. Through biochemical assays, they then determine that the C-terminus of Slack physically interacts with the N- and C-termini of NaV1.6. Using the information gleaned from the in vitro experiments the authors then show that virally-mediated transduction of Slack's C-terminus lessens the extent of SlackG269S-induced seizures. These data uncover a previously unrecognized interaction between a sodium and a potassium channel, which contributes to the latter's sensitivity to quinidine.

      The conclusions of this paper are mostly well supported by data, but some aspects of functional and structural studies in vivo as well as physically interaction need to be clarified and extended.

      (1) Immunolabeling of the hippocampus CA1 suggests sodium channels as well as Slack colocalization with AnkG (Fig 3A). Proximity ligation assay for NaV1.6 and Slack or a super-resolution microscopy approach would be needed to increase confidence in the presented colocalization results. Furthermore, coimmunoprecipitation studies on the membrane fraction would bolster the functional relevance of NaV1.6-Slac interaction on the cell surface.

      We thank the reviewer for good suggestions. We acknowledge that employing proximity ligation assay and high-resolution techniques would significantly enhance our understanding of the localization of the Slack-NaV1.6 coupling.

      At present, the technical capabilities available in our laboratory and institution do not support high-resolution testing. However, we are enthusiastic about exploring potential collaborations to address these questions in the future. Furthermore, we fully recognize the importance of conducting co-immunoprecipitation (Co-IP) assays from membrane fractions. While we have already completed Co-IP assays for total protein and quantified the FRET efficiency values between Slack and NaV1.6 in the membrane region, the Co-IP assays on membrane fractions will be conducted in our future investigations.

      (2) Although hippocampal slices from Scn8a+/- were used for studies in Fig. S8, it is not clear whether Scn8a-/- or Scn8a+/- tissue was used in other studies (Fig 1J & 1K). It will be important to clarify whether genetic manipulation of NaV1.6 expression (Fig. 1K) has an impact on sodium-activated potassium current, level of surface Slack expression, or that of NaV1.6 near Slack.

      We thank the reviewer for pointing this out. In Fig. 1G,J,K, primary cortical neurons from homozygous NaV1.6 knockout (Scn8a-/-) mice were used. We will clarify this information in the revised manuscript. In terms of the effects of genetic manipulation of NaV1.6 expression on IKNa and surface Slack expression, we compared the amplitudes of IKNa measured from homozygous NaV1.6 knockout (NaV1.6-KO) neurons and wild-type (WT) neurons. The results showed that homozygous knockout of NaV1.6 does not alter the amplitudes of IKNa (Author response image 2). The level of surface Slack expression will be tested further.

      Author response image 2.

      The amplitudes of IKNa in WT and NaV1.6-KO neurons (data from manuscript Fig. 1K). ns, p > 0.05, unpaired two-tailed Student’s t test.

      (3) Did the epilepsy-related Slack mutations have an impact on NaV1.6-mediated sodium current?

      We thank the reviewer’s question. We examined the amplitudes of NaV1.6 sodium current upon expression alone or co-expression of NaV1.6 with epilepsy-related Slack mutations (K629N, R950Q, K985N). The results showed that the tested epilepsy-related Slack mutations do not alter the amplitudes of NaV1.6 sodium current (Author response image 3).

      Author response image 3.

      The amplitudes of NaV1.6 sodium currents upon co-expression of NaV1.6 with epilepsy-related Slack mutant variants (SlackK629N, SlackR950Q, and SlackK985N). ns, p>0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      4) Showing the impact of quinidine on persistent sodium current in neurons and on NaV1.6-expressing cells would further increase confidence in the role of persistent sodium current on sensitivity of Slack to quinidine.

      We appreciate the reviewer’s question. Previous studies have shown that quinidine can inhibit persistent sodium currents at low concentrations1. In our experiments, blocking persistent sodium currents by application of riluzole in the bath solution showed no significant effects on the sensitivity of Slack to quinidine blockade upon co-expression of Slack with NaV1.6 (Fig. 2F,H). This result suggested that persistent sodium currents were not involved in the sensitization of Slack to quinidine blockade.

      1. Ju YK, Saint DA, Gage PW. Effects of lignocaine and quinidine on the persistent sodium current in rat ventricular myocytes. Br J Pharmacol. Oct 1992; 107(2):311-6. doi:10.1111/j.1476-5381.1992.tb12743.x
    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Chan et al. tried identifying the binding sites or pockets for the KCNQ1-KCNE1 activator mefenamic acid. Because the KCNQ1-KCNE1 channel is responsible for cardiac repolarization, genetic impairment of either the KCNQ1 or KCNE1 gene can cause cardiac arrhythmia. Therefore, the development of activators without side effects is highly demanded. Because the binding of mefenamic acid requires both KCNQ1 and KCNE1 subunits, the authors performed drug docking simulation by using KCNQ1-KCNE3 structural model (because this is the only available KCNQ1-KCNE structure) with substitution of the extracellular five amino acids (R53-Y58) into D39-A44 of KCNE1. That could be a limitation of the work because the binding mode of KCNE1 might differ from that of KCNE3. Still, they successfully identified some critical amino acid residues, including W323 of KCNQ1 and K41 and A44 of KCNE1. They subsequently tested these identified amino acid residues by analyzing the point mutants and confirmed that they attenuated the effects of the activator. They also examined another activator, yet structurally different DIDS, and reported that DIDS and mefenamic acid share the binding pocket, and they concluded that the extracellular region composed of S1, S6, and KCNE1 is a generic binding pocket for the IKS activators.

      The data are solid and well support their conclusions, although there are a few concerns regarding the choice of mutants for analysis and data presentation.

      Other comments:

      1. One of the limitations of this work is that they used psKCNE1 (mostly KCNE3), not real KCNE1, as written above. It is also noted that KCNQ1-KCNE3 is in the open state. Unbinding may be facilitated in the closed state, although evaluating that in the current work is difficult.

      We agree that it is difficult to evaluate the role of unbinding from our model. Our data showing that longer interpulse intervals have a normalizing effect on the GV curve (Figure 3-figure supplement 2) could be interpreted to suggest that unbinding occurs in the closed state. Alternatively, the slowing of deactivation caused by S1-S6 interactions and facilitated by the activators may effectively be exceeded at the longer interpulse intervals.

      1. According to Figure 2-figure supplement 2, some amino acid residues (S298 and A300) of the turret might be involved in the binding of mefenamic acid. On the other hand, Q147 showing a comparable delta G value to S298 and A300 was picked for mutant analysis. What are the criteria for the following electrophysiological study?

      EP experiments interrogated selected residues with significant contributions to mefenamic acid and DIDs coordination as revealed by the MM/GBSA and MM/PBSA methods. A300 was identified as potentially important. We did attempt A300C but were never able to get adequate expression for analysis.

      1. It is an interesting speculation that K41C and W323A stabilize the extracellular region of KCNE1 and might increase the binding efficacy of mefenamic acid. Is it also the case for DIDS? K41 may not be critical for DIDS, however.

      Yes, we found K41 was not critical to the binding/action of DIDS compared to MEF. In electrophysiological experiments with the K41C mutation, DIDS induced a leftward GV shift (~ -25 mV) whereas the normalized response was statistically non-significant. In MD simulation studies, we observed detachment of DIDS from K41C-Iks only in 3 runs out of 8 simulations. This is in contrast to Mef, where the drug left the binding site of K41C-Iks complex in all simulations.

      1. Same to #2, why was the pore turret (S298-A300) not examined in Figure 7?

      Again, we attempted A300C but could not get high enough expression.

      Reviewer #3 (Public Review):

      Weaknesses:

      1. The computational aspect of the work is rather under-sampled - Figure 2 and Figure 4. The lack of quantitative analysis on the molecular dynamic simulation studies is striking, as only a video of a single representative replica is being shown per mutant/drug. Given that the simulations shown in the video are extremely short; some video only lasts up to 80 ns. Could the author provide longer simulations in each simulation condition (at least to 500 ns or until a stable binding pose is obtained in case the ligand does not leave the binding site), at least with three replicates per each condition? If not able to extend the length of the simulations due to resources issue, then further quantitative analysis should be conducted to prove that all simulations are converged and are sufficient. Please see the rest of the quantitative analysis in other comments.

      We provide more quantitative analysis for the existing MD simulations and ran five additional simulations with 500 ns duration by embedding the channel in a POPC lipid membrane. For the new MD simulations, we used a different force field in order to minimize ambiguity related to force fields as well. Analysis of these data has led to new data and supplemental figures regarding RMSD of ligands during the simulations (Figure 4-figure supplement 1 and Figure 6-figure supplement 3), clustering of MD trajectories based on Mef conformation (Figure 2-figure supplement 3 and Figure 6 -figure supplement 2), H-bond formation over the simulations (Figure 2-figure supplement 4 and Figure 6-figure supplement 1). We have edited the manuscript to include this new information where appropriate.

      1. Given that the protein is a tetramer, at least 12 datasets could have been curated to improve the statistic. It was also unclear how frequently the frames from the simulations were taken in order to calculate the PBSA/GBSA.

      By using one ligand for each ps-IKs channel complex we tried to keep the molecular system and corresponding analysis as simple as was possible. Our initial results have shown that 4D docking and subsequent MD simulations with only one ligand bound to ps-IKs was complicated enough. Our attempts to dock 4 ligands simultaneously and analyze the properties of such a system were ineffective due to difficulties in: i) obtaining stable complexes during conformational sampling and 4D docking procedures, since the ligand interaction covers a region including three protein chains with dynamic properties, ii) possible changes of receptor conformation properties at three other subunits when one ligand is already occupying its site, iii) marked diversity of the binding poses of the ligand as cluster analysis of ligand-channels complex shows (Figure 2-figure supplement 3).

      We have added a line in the methods to clarify the use of only one ligand per channel complex in simulations.

      In order to calculate MMPBSA/MMGBSA we used a frame every 0.3 ns throughout the 300 ns simulation (1000 frames/simulation) or during the time the ligand remained bound. We have clarified this in the Methods.

      1. The lack of labels on several structures is rather unhelpful (Figure 2B, 2C, 4B). The lack of clarity of the interaction map in Figures 2D and 6A.

      We updated figures considering the reviewer's comments and added labels. For 2D interaction maps, we provided additional information in figure legends to improve clarity.

      1. The RMSF analysis is rather unclear and unlabelled thoroughly. In fact, I still don't quite understand why n = 3, given that the protein is a tetramer. If only one out of four were docked and studied, this rationale needs to be explained and accounted for in the manuscript.

      The rationale of conducting MD simulations with one ligand bound to IKs is explained in response to point 2 of the reviewer’s comments.

      RMSF analysis in Figure 4C-E was calculated using the chain to which Mef was docked but after Mef had left the binding site. Details were added to the methods.

      1. For the condition that the ligands suppose to leave the site (K42C for Mef and Y46A for DIDS), can you please provide simulations at a sufficient length of time to show that ligand left the site over three replicates? Given that the protein is a tetramer, I would be expecting three replicates of data to have four data points from each subunit. I would be expecting distance calculation or RMSD of the ligand position in the binding site to be calculated either as a time series or as a distribution plot to show the difference between each mutant in the ligand stability within the binding pocket. I would expect all the videos to be translatable to certain quantitative measures.

      We have shown in the manuscript that the MEF molecule detaches from the K41C/IKs channel complex in all three simulations (at 25 ns, 70 ns and 20 ns, Table. 4). Similarly, the ligand left the site in all five new 500 ns duration simulations. We did not provide simualtions for Y46A, but Y46C left the binding site in 4 of 5 500 ns simulations and changed binding pose in the other.

      Difficulties encountered upon extending the docking and MD simulations for 4 receptor sites of the channel complex is discussed in our response to point # 2 of the reviewer.

      1. Given that K41 (Mef) and Y46 are very important in the coordination, could you calculate the frequency at which such residues form hydrogen bonds with the drug in the binding site? Can you also calculate the occupancy or the frequency of contact that the residues are making to the ligand (close 4-angstrom proximity etc.) and show whether those agree with the ligand interaction map obtained from ICM pro in Figure 2D?

      We thank the reviewer for the suggestion to analyze the H-bond contribution to ligand dynamics in the binding site. In the plots shown in Figure 2-figure supplement 4 and Figure 6-figure supplement 1, we now provide detailed information about the dynamics of the H-bond formation between the ligand and the channel-complex throughout simulations. In addition, we have quantified this and have added these numbers to a table (Table 2) and in the text of the results.

      1. Given that the author claims that both molecules share the same binding site and the mode of ligand binding seems to be very dynamic, I would expect the authors to show the distribution of the position of ligand, or space, or volume occupied by the ligand throughout multiple repeats of simulations, over sufficient sampling time that both ligand samples the same conformational space in the binding pocket. This will prove the point in the discussion - Line 463-464. "We can imagine a dynamic complex... bind/unbind from Its at a high frequency".

      To support our statement regarding a dynamic complex we analyzed longer MD simulations and clustered trajectories, from this an average conformation from each cluster was extracted and provided as supplementary information which shows the different binding modes for Mef (Figure 2-figure supplement 3). DIDS was more stable in MD simulations and though there were also several clusters, they were similar enough that when using the same cut-off distance as for mefenamic acid, they could be grouped into one cluster. (Note the scale differences on dendrogram between Figure 2-figure supplement 3 and Figure 6-figure supplement 2).

      1. I would expect the authors to explain the significance and the importance of the PBSA/GBSA analysis as they are not reporting the same energy in several cases, especially K41 in Figure 2 - figure supplement 2. It was also questionable that Y46, which seems to have high binding energy, show no difference in the EPhys works in figure 3. These need to be commented on.

      Several studies indicate that G values calculated using MM/PBSA and MM/GBSA methods may vary. Some studies report marked differences and the reasons for such a discrepancy is thoroughly discussed in a review by Genheden and Ryde (PMID: 25835573). Therefore, we used both methods to be sure that key residues contributing to ligand binding identified with one method appear in the list of residues for which the calculations are done with the other method.

      Y46C which showed only a slightly less favorable binding energy and did not unbind during 300 ns simulations, unbound, or changed pose in 4 out of 5 of the longer simulations in the presence of a lipid membrane (Figure 4-figure supplement 1). The discrepancy between electrophysiological and MD data is commented in the manuscript (pages 12-13).

      1. Can the author prove that the PBSA/GBSA analysis yielded the same average free energy throughout the MD simulation? This should be the case when the simulations are converged. The author may takes the snapshots from the first ten ns, conduct the analysis and take the average, then 50, then 100, then 250 and 500 ns. The author then hopefully expects that as the simulations get longer, the system has reached equilibrium, and the free energy obtained per residue corresponds to the ensemble average.

      As we mention in the manuscript, MEF- channel interactions are quite dynamic and vary even from simulation to simulation. The frequent change of the binding pose of the ligands observed during simulations (represented in Figure 2 - figure supplement 3 as clusters) is a clear reflection of such a dynamic process. Therefore, we do not expect the same average energy throughout the simulation but we do expect that G values stands above the background for key residues, which was generally the case (Figure 2 - figure supplement 2 and Figure 6.)

      1. The phrase "Lowest interaction free energy for residues in ps-KCNE1 and selected KCNQ1 domains are shown as enlarged panels (n=3 for each point)" needs further explanation. Is this from different frames? I would rather see this PBSA and GBSA calculated on every frame of the simulations, maybe at the one ns increment across 500 ns simulations, in 4 binding sites, in 3 replicas, and these are being plotted as the distribution instead of plotting the smallest number. Can you show each data point corresponding to n = 3?

      The MMPBSA/MMGBSA was calculated for 1000 frames across 3x300 ns simulations with 0.3 ns sampling interval, together 3000 frames, shown in Figure 2-figure supplement 2 and includes error bars to show the differences across runs. We have updated the legend for greater clarity.

      1. I cannot wrap my head around what you are trying to show in Figure 2B. This could be genuinely improved with better labelling. Can you explain whether this predicted binding pose for Mef in the figure is taken from the docking or from the last frame of the simulation? Given that the binding mode seems to be quite dynamic, a single snapshot might not be very helpful. I suggest a figure describing different modes of binding. Figure 2B should be combined with figure 2C as both are not very informative.

      We have updated Figure 2B with better labelling and added a new figure showing the different modes of binding (Figure 2-figure supplement 3).

      1. Similar to the comment above, but for Figure 4B. I do not understand the argument. If the author is trying to say that the pocket is closed after Mef is removed - then can you show, using MD simulation, that the pocket is openable in an apo to the state where Mef can bind? I am aware that the open pocket is generated through batches of structures through conformational sampling - but as the region is supposed to be disordered, can you show that there is a possibility of the allosteric or cryptic pocket being opened in the simulations? If not, can you show that the structure with the open pocket, when the ligand is removed, is capable of collapsing down to the structure similar to the cryo-EM structure? If none of the above work, the author might consider using PocketMiner tools to find an allosteric pocket (https://doi.org/10.1038/s41467-023-36699-3) and see a possibility that the pocket exists.

      Please see the attached screenshot which depicts the binding pocket from the longest run we performed (1250 ns) before drug detachment (grey superimposed structures) and after (red superimposed structures). Mefenamic acid is represented as licorice and colored green. Snapshots for superimposition were collected every 10 ns. As can be seen in the figure, when the drug leaves the binding site (after 500 ns, structures colored red), the N-terminal residue of psKCNE1, W323, and other residues that form the pocket shift toward the binding site, overlapping with where Mefenamic acid once resided. The surface structure in Figure 4B shows this collapse.

      Author response image 1.

      In the manuscript, we propose that drug binding occurs by the mechanism that could be best described by induced fit models, which state that the formation of the firm complexes (channel-Mef complex) is a result of multiple-states conformational adjustments of the bimolecular interaction. These interactions do not necessarily need to have large interfaces at the initial phase. This seems to be the case in Mef with IKS interactions, since we could not identify a pocket of appropriate size either using PocketMiner software suggested by the reviewer or with PocketFinder tool of ICM-pro software.

      1. Figure 4C - again, can you show the RMSF analysis of all four subunits leading to 12 data points? If it is too messy to plot, can you plot a mean with a standard deviation? I would say that a 1-1.5 angstroms increase in the RMSF is not a "markedly increased", as stated on line 280. I would also encourage the authors to label whether the RMSF is calculated from the backbone, side-chain or C-alpha atoms and, ideally, compare them to see where the dynamical properties are coming from.

      Please see the answer to comment #4. We agree that the changes are not so dramatic and modified the text accordingly. RMSD was calculated for backbone atom to compare residues with different side chains, a note of this is now in the methods and statistical significance of ps-IKs vs K41C, W323A and Y46C is indicated in Figures 4C-4E.

      1. In the discussion - Lines 464-467. "Slowed deactivation of the S1/KCNE1/Pore domain/drug complex... By stabilising the activated complex. MD simulation suggests the latter is most likely the case." Can you point out explicitly where this has been proven? If the drug really stabilised the activated complex, can you show which intermolecular interaction within E1/S1/Pore has the drug broken and re-form to strengthen the complex formation? The authors have not disproven the point on steric hindrance either. Can this be disproved by further quantitative analysis of existing unbiased equilibrium simulations?

      The stabilization of S1/KCNE1/Pore by drugs does not necessarily have to involve a creation of new contacts between protein parts or breakage of interfaces between them. The stabilization of activated complexes by drugs may occur when the drug simultaneously binds to both moveable parts of the channel, such as voltage sensor(s) or upper KCNE1 region, and static region(s) of the channel, such as the pore domain. We have changed the corresponding text for better clarity.

      1. Figure 4D - Can you show this RMSF analysis for all mutants you conducted in this study, such as Y46C? Can you explain the difference in F dynamics in the KCNE3 for both Figure 4C and 4D?

      We now show the RMSF for K41C, W323A and Y46C in Figure 4C-E. We speculate that K41 (magenta) and W323 (yellow), given their location at the lipid interface (see Author response image 1), may be important stabilizing residues for the KCNE N-terminus, whereas Y46 (green) which is further down the TMD has less of an impact.

      Author response image 2.

      1. Line 477: the author suggested that K41 and Mef may stabilise the protein-protein interface at the external region of the channel complex. Can you prove that through the change in protein-protein interaction, contact is made over time on the existing MD trajectories, whether they are broken or formed? The interface from which residues help to form and stabilise the contact? If this is just a hypothesis for future study, then this has to be stated clearly.

      It is known that crosslinking of several residues of external E1 with the external pore residues dramatically stabilizes voltage-sensors of KCNQ1/KCNE1 complex in the up-state conformation. This prevents movable protein regions in the voltage-sensors returning to their initial positions upon depolarization, locking the channel in an open state. We suggest that MEF may restrain the backward movement of voltage-sensors in a similar way that stabilizes open conformation of the channel. The stabilization of the voltage sensor domain through MEF occurs due to contacts of the drug with both static (pore domain) and dynamic protein parts (voltage-sensors and external KCNE1 regions). We have changed the corresponding part of the text.

      1. The author stated on lines 305-307 that "DIDS is stabilised by its hydrophobic and vdW contacts with KCNQ1 and KCNE1 subunits as well as by two hydrogen bonds formed between the drug and ps-KCNE1 residue L42 and KCNQ1 residue Q147" Can you show, using H-bond analysis that these two hydrogen bonds really exist stably in the simulations? Can you show, using minimum distance analysis, that L42 are in the vdW radii stably and are making close contact throughout the simulations?

      We performed a detailed H-bond analysis (Figure 6-supplement figure 1) which shows that DIDS forms multiple H-bond over the simulations, though only some of them (GLU43, TYR46, ILE47, SER298, TYR299, TRP323 ) are stable. Thus, the H-bonds that we observed in DIDS-docking experiments were unstable in MD simulations. As in the case of the IKs-MEF complex, the prevailing H-bonds exhibit marked quantitative variability from simulation to simulation. We have added a table detailing the most frequent H-bonds during MD simulations (Table 2).

      1. Discussion - In line 417, the author stated that the "S1 appears to pull away from the pore" and supplemented the claim with the movie. This is insufficient. The author should demonstrate distance calculation between the S1 helix and the pore, in WT and mutants, with and without the drug. This could be shown as a time series or distribution of centre-of-mass distance over time.

      We tried to analyze the distance changes between the upper S1 and the pore domain but failed to see a strong correlation We have removed this statement from the discussion.

      1. Given that all the work were done in the open state channel with PIP2 bound (PDB entry: 6v01), could the author demonstrate, either using docking, or simulations, or alignment, or space-filling models - that the ligand, both DIDS and Mef, would not be able to fit in the binding site of a closed state channel (PDB entry: 6v00). This would help illustrate the point denoted Lines 464-467. "Slowed deactivation of the S1/KCNE1/Pore domain/drug complex... By stabilising the activated complex. MD simulation suggests the latter is most likely the case."

      As of now, a structure representing the closed state of the channel does not exist. 6V00 is the closed inactivated state of the channel pore with voltage-sensors in the activated conformation. In order to create simulation conditions that reliably describe the electrophysiological experiments, at least a good model for closed channels with resting state voltage sensors is necessary.

      1. The author stated that the binding pose changed in one run (lines 317 to 318). Can you comment on those changes? If the pose has changed - what has it changed to? Can you run longer simulations to see if it can reverse back to the initial confirmation? Or will it leave the site completely?

      Longer simulations and trajectory clustering revealed several binding modes, where one pose dominated in approximately 50% of all simulations in Figure 2-figure supplement 3 encircled with a blue frame.

      1. Binding free energy of -32 kcal/mol = -134 kJ/mol. If you try to do dG = -RTlnKd, your lnKd is -52. Your Kd is e^-52, which means it will never unbind if it exists. I am aware that this is the caveat with the methodologies. But maybe these should be highlighted throughout the manuscript.

      We thank the reviewer for this comment. G values, and corresponding Kd values, calculated from simulation of Mef-ps-IKs complex do not reflect the apparent Kd values determined in electrophysiological experiments, nor do they reflect Kd values of drug binding that could be determined in biochemical essays. Important measures are the changes observed in simulations of mutant channel complexes relative to wild type. We now briefly mention this issue in the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) It would be nice to have labels of amino acid residues in Figure 2B.

      We updated Figure 2B and added some residue labels.

      2) Fig. 3A and 7A. In what order the current traces are presented? I don't see the rule.

      We have now arranged the current traces in a more orderly manner, listing them first by ascending KCNE1 residue numbers and then by ascending KCNQ1 residue numbers. Now consistent with Fig 3 and 7 (normalized response and delta V1/2).

      3) Line 312 "A44 and Y46 were more so." A44 may be more critical, but I can't see Y46 is more, according to Figure 2-figure supplement2 and Figure 6.

      Indeed, comparison of the energy decomposition data indicates approximately the same ∆G values for Y46. We have revised this in the text correspondingly.

      4) Line 267 "Mefenamic acid..." I would like to see the movie.

      We no longer have access to this original movie

      5) In supplemental movies 5-7, the side chains of some critical amino acid residues (W323, K41) would be better presented as in movies 1-4.

      We have retained the original presentations of these movies as the original files are no longer available.

      Reviewer #2 (Recommendations For The Authors):

      General comments:

      1) To determine the effect of mefenamic acid and DIDS on channel closing kinetics, a protocol in which they step from an activating test pulse to a repolarizing tail pulse to -40 mV for 1 s is used. If I understand it right, the drug response is assessed as the difference in instantaneous tail current amplitude and the amplitude after 1 s (row 599-603). The drug response of each mutant is then normalized to the response of the WT channel. However, for several mutants there is barely any sign of current decay during this relatively brief pulse (1 s) at this specific voltage. To determine drug effects more reliably on channel closing kinetics/the extent of channel closing, I wonder if these protocols could be refined? For instance, to cover a larger set of voltages and consider longer timescales?

      To clarify, the drug response of each mutant is not normalized to the response of the WT channel. In fact, our analysis is not meant to compare mutant and WT tail current decay but rather how isochronal tail current decay is changed in response to drug treatment in each channel construct. As acknowledged by the reviewer, the peak to end difference currents were calculated by subtracting the minimum amplitude of the deactivating current from the peak amplitude of the deactivating current. But the difference current in mefenamic acid or DIDS was normalized to the maximum control (in the absence of drug) difference current and subtracted from 1.0 to obtain the normalized response. Thus, the difference in tail current decay in the absence and in the presence of drug is measured within the same time scale and allow a direct comparison between before and after drug treatment. As shown in Fig 3D and 7C, a large drug response such as the one measured in WT channels is reflected by a value close to 1. A smaller drug response is indicated by low values. We recognize that some mutations resulted in an intrinsic inhibition of tail current decay in the absence of drug, which potentially lead to underestimating the normalized response value. Our goal was not to study in detail the effects of the drug on channel closing kinetics, but only to determine the impact of the mutation on drug binding by using tail current decay as a readout. Consequently, we believe that the duration of the deactivating tail current used in this experiment was sufficient to detect drug-induced tail current decay inhibition.

      2) The effect of mefenamic acid seems to be highly dependent on the pulse-to-pulse interval in the experiments. For instance, for WT in Figure 3 - Figure supplement 1, a 15 s pulse-to-pulse interval provides a -100 mV shift in V1/2 induced by mefenamic acid, whereas there is no shift induced when using a 30 s pulse-to-pulse interval. Can the authors explain why they generally consider a 15 s pulse-to-pulse interval more suitable (physiologically relevant?) in their experiments to assess drug effects?

      In our previous experiments, we have determined that a 15 s inter-pulse interval is generally adequate for the WT IKs channels to fully deactivate before the onset of the next pulse. Consistent with our previous work (Wang et al. 2019), we observed that in wild-type EQ channels, there is no current summation from one pulse to the next one (see Fig 1A, bottom panel). This is important as the IKs channel complex is known to be frequency dependent i.e. current amplitude increases as the inter-pulse interval gets shorter. Such current summation results in a leftward shift of the conductance-voltage (GV) relationship. This is also important with regards to drug effects. As indicated by the reviewer, mefenamic acid effects are prominent with a 15 sec inter-pulse interval but less so with a 30 sec inter-pulse interval when enough time is given for channels to more completely deactivate. Full effects of mefenamic acid would have therefore been concealed with a 30sec inter-pulse interval.

      Moreover, our patch-clamp recordings aim to explore the distinct responses of mutant channels to mefenamic acid and DIDS in comparison to the wild-type channel. It is important to note that the inter-pulse interval's physiological relevance is not necessarily crucial in this context.

      3) Related to comment 1 and 2, there is a large diversity in the intrinsic properties of tested mutants. For instance, V1/2 ranges from 4 to 70 mV. Also, there is large variability in the slope of the G-V curves. Whether channel closing kinetics, or the impact of pulse-to-pulse interval, vary among mutants is not clear. Could the authors please discuss whether the intrinsic properties of mutants may affect their ability to respond to mefenamic acid and DIDS? Also, please provide representative current families and G-V curves for all assessed mutants in supplementary figures.

      The intrinsic properties of some mutants vary from the WT channels and influence their responsiveness to mefenamic acid and DIDS. The impact of the mutations on the IKs channel complex are reflected by changes in V1/2 (Table 1, 4) and tail current decay (Figs. 3, 7). But, it is the examination of the drug effects on these intrinsic properties (i.e. GV curve and tail current decay) that constitutes the primary endpoint of our study. We consider that the degree by which mef and DIDS modify these intrinsic properties reflects their ability to bind or not to the mutated channel. In our analysis, we compared each mutant's response to mefenamic acid and DIDS with its respective control. Consequently, the intrinsic properties of the mutant channels have already been considered in our evaluation. As requested, we have provided representative current families and G-V curves for all assessed mutants in Figure 3-figure supplement 1 and Figure 7-figure supplement 1.

      4) The A44C and Y148C mutants give strikingly different currents in the examples shown in Figure 3 and Figure 7. What is the reason for this? In the examples in figure 7, it almost looks like KCNE1 is absent. Although linked constructs are used, is there any indication that KCNE1 is not co-assembled properly with KCNQ1 in those examples?

      The size of the current is critical to determining its shape, as during the test pulse there is some endogenous current mixed in which impacts shape. A44C and Y148C currents shown in Figure 7 are smaller with a larger contribution of the endogenous current, mostly at the foot of the current trace. In our experience there is little endogenous current in the tail current at -40 mV and for this reason we focus our measurements there.

      Although constructs with tethered KCNQ1 and KCNE1 were used, we cannot rule out the possibility that Q1 and E1 interaction was altered by some of the mutations. Several KCNE1 and KCNQ1 residues have been identified as points of contact between the two subunits. For instance, the KCNE1 loop (position 36-47) has been shown to interact with the KCNQ1 S1-S2 linker (position 140-148) (Wang et al, 2011). Thus, it is conceivable that mutation of one or several of those residues may alter KCNQ1/KCNE1 interaction and modify the activation/deactivation kinetics of the IKs channel complex.

      5) I had a hard time following the details of the simulation approaches used. If not already stated (I could not find it), please provide: i) details on whether the whole channel protein was considered for 4D docking or a docking box was specified, ii) information on how simulations with mutant ps-IKs were prepared (for instance with the K41C mutant), especially whether the in silico mutated channel was allowed to relax before evaluation (and for how long). Also, please make sure that information on simulation time and number of repeats are provided in the Methods section.

      For 4D docking, only residues within 0.8 nm of psKCNE1 residues D39-A44 were selected. Complexes with mutated residues were relaxed using the same protocol as the WT channel, (equilibration with gradually releasing restraints with a final equilibration for 10 ns where only the backbone was constrained with 50 kcal/mol/nm2). We have updated the methods accordingly.

      Specific comments:

      In figure legends, please provide information on whether data represents mean +/- SD or SEM. Also, please provide information on which statistical test was used in each figure.

      We revised the figure legend to add the nature of the statistical test used.

      G-V curves are normalized between 0 and 1. However, for many mutants the G-V relationship does not reach saturation at depolarized voltages. Does this affect the estimated V1/2? I could not really tell as I was not sure how V1/2 was determined for different mutants (could the explanation on row 595-598 be clarified)?

      The primary focus here is in the shift between the control response and drug response for each mutant, rather than the absolute V1/2 values. The isochronal G-V curves that are generated for each construct (WT and mutant) utilize an identical voltage protocol. This approach ensures a uniform comparison among all mutants. By observing the shifts in these curves, we can gain insight into the response of mutant channels to the drug. This information ultimately helps elucidate the inherent properties of the mutant channels and contributes to our understanding of the drug's binding mechanism to the channel.

      As requested by the reviewer, we also clarified the way V1/2 was generated: When the G-V curve did not reach zero, the V1/2 value was directly read from the plot at the voltage point where the curve crossed the 0.5 value on the y coordinate.

      A general comment is that the Discussion is fairly long and some sections are quite redundant to the Results section. The authors could consider focusing the text in the Discussion.

      We changed the discussion correspondingly wherever it was appropriate.

      I found it a bit hard to follow the authors interpretation on whether their drug molecules remain bound throughout the experiments, or whether there is fast binding/unbinding. Please clarify if possible.

      In the 300 ns MD simulations mefenamic acid and DIDS remained stably bound to WT-ps-IKS, binding of drugs to mutant complexes are described in the Table 3 and Table 5. In longer simulations with the channel embedded in a lipid environment, mefenamic acid unbinds in two out of five runs for WT-ps-IKs (Figure 4 – figure supplement 1), and DIDS shows a few events where it briefly unbinds (Figure 6 -figure supplement 3). Based on electrophysiological data we speculate that drugs might bind and unbind to WT-ps-IKs during the gating process. We do not see bind-unbinding in MD simulations, since the model we used in simulations reflects only open conformation of the channel-complex with an activated-state voltage-sensor, whereas a resting-state voltage sensor condition was not considered.

      The authors have previously shown that channels with no, one or two KCNE1 subunits are not, or only to a small extent, affected by mefenamic acid (Wang et al., 2020). Could the details of the binding site and proposed mechanisms of action provide clues as to why all binding sites need to be occupied to give prominent drug effects?

      In the manuscript, we propose that the binding of drugs induces conformational changes in the pocket region that stabilize S1/KCNE1/Pore complex. In the tetrameric channel with 4:4 alpha to beta stoichiometry the drugs are likely to occupy all four sites with complete stabilization of S1/KCNE1/Pore. When one or more KCNE1 subunits is absent, as in case of EQQ, or EQQQQ constructs, drugs will bind to the site(s) where KCNE1 is available. This will lead to stabilization of the only certain part of the S1/KCNE1/Pore complex. We believe that the corresponding effect of the drug, in this case will be partially effective.

      There is a bit of jumping in the order of when some figures are introduced (e.g. row 178 and 239). The authors could consider changing the order to make the figures easier to follow.

      We have changed the corresponding section appropriately to improve the reading flow.

      Row 237: "Data not shown", please show data.

      The G-V curve of the KCNE1 Y46C mutant displays a complex, double Boltzmann relationship which does not allow for the calculation of a meaningful V1/2 nor would it allow for an accurate determination of drug effects. Consequently, we have excluded it from the manuscript.

      In the Discussion, the author use the term "KCNE1/3". Does this correspond to the previous mention of "ps-KCNE1"?

      Yes, this refers to ps-KCNE1. We have changed it correspondingly.

      Row 576: When was HMR 1556 used?

      While HMR 1556 was used in preliminary experiments to confirm that the recorded current was indeed IKs, it does not provide substantial value to the data presented in our study or our experiments. As a result, we have excluded HMR 1556 experiments from the final results and have revised the Methods section accordingly.

      Reviewer #3 (Recommendations For The Authors):

      1) Figures 2D and 6A are very unclear. Can the authors provide labels as text rather than coloured circles, whether the residue is on Q1 or E1? There is also a distance label in the figure in the small font with the faintest shade of grey, which I believe is supposed to be hydrogen bonds. Can this be improved for clarity?

      We feel that additional labels on the ligand diagrams to be more confusing, instead, we updated the description in the legend and added labels to Figure 2B and Figure 6B to improve the clarity of residue positions. In addition, we have added 2 new figures with more detailed information about H-bonds (Figure 2-figure supplement 4, Figure 6- figure supplement 1).

      2) Figure 2B - all side chains need labelling in different binding modes. The green ligand on blue protein is very difficult to see. Suddenly, the ligand turns light blue in panel 2C. Can this be consistent throughout the manuscript?

      Figure 2B is updated according to this comment.

      3) Figure 2 - figure supplement 2, and figure 6B. Can the author show the residue number on the x-axis instead of just the one-letter abbreviation? This requires the reader to count and is not helpful when we try to figure out where the residue is at a glance. I would suggest a structure label adjacent to the plot to show whether they are located with respect to the drug molecule.

      Since the numbers for residues on either end of the cluster are indicated at the bottom of each boxed section, we feel that adding residue numbers would just further clutter the figure.

      4) Figure 2 - figure supplement 2, and Figure 6B. Can you explain what is being shown in the error bar? I assume standard deviation?

      Error bars on Figure 2-figure supplement 2 represent SEM. We added corresponding text in the figure legend.

      5) Figure 2 - figure supplement 2, and figure 6B. Can you explain how many frames are being accounted for in this PBSA calculation?

      For Figure 2- figure supplement 2 and Figure 6B a frame was made every 0.3 ns over 3x300 ns simulation, 1000 frames for each simulation, 3000 frames overall.

      6) Figure 3D/E and 7C/D, it would be helpful to show which mutant show agreeable results with the simulations, PBSA/GBSA and contact analyses as suggested above.

      The inconsistencies and discrepancies between the results of MD simulations and electrophysiological experiments are discussed throughout the manuscript.

      7) Figure legend, figure 3E - I assume that there is a type that is different mutants with respect to those without the drug. Otherwise, how could WT, with respect to WT, has -105 mV dV1/2?

      The reviewer is correct in that the bars indicate the difference in V1/2 between control and drug treatment. Thus, the difference in V1/2 (∆V1/2) between the V1/2 calculated for WT control and the V1/2 for mefenamic acid is indeed -105 mV. We have now revised Figure 3E's legend to accurately reflect this and ensure a clear understanding of the data presented.

      8) Figure 3 - figure supplement 1B is very messy, and I could not extract the key point from it. Can this be plotted on a separate trace? At least 1 WT trace and one mutant trace, 1 with WT+drug and one mut+drug as four separate plots for clarity?

      The key message of this figure is to illustrate the similarities of EQ WT + Mef and EQ L142C data. Thus, after thorough consideration, we have concluded that maintaining the current figure, which displays the progressive G-V curve shift in EQ WT and L142C in a superimposed manner, best illustrates the gradual shift in the G-V curves. This presentation allows for a clearer and more immediate comparison of the curve shifts, which may be more challenging to discern if the G-V curves were separated into individual figures. We believe that the existing format effectively communicates the relevant information in a comprehensive and accessible manner.

      9) Figure 4B - the label Voltage is blended into the orange helix. Can the label be placed more neatly?

      We altered the labels for this figure and added that information in the figure description.

      10) Can you show the numerical label of the residue, at least only to the KCNE1 portion in Figures 4C and 4D?

      We updated these figures and added residue numbering for clarity.

      11) Can you hide all non-polar hydrogen atoms in figure 8 and colour each subunit so that it agrees with the rest of the manuscripts? Can you adjust the position of the side chain so that it is interpretable? Can you summarise this as a cartoon? For example, Q147 and Y148 are in grey and are very far hidden away. So as S298. Can you colour-code your label? The methionine (I assume M45) next to T327 is shown as the stick and is unlabelled. Maybe set the orthoscopic view, increase the lighting and rotate the figures in a more interpretable fashion?

      We agree that Fig.8 is rather small as originally presented. We have tried to emphasize those residues we feel most critical to the study and inevitably that leads to de-emphasis of other, less important residues. As long as the figure is reproduced at sufficient size we feel that it has sufficient clarity for the purposes of the Discussion.

      12) Line 538-539. Can you provide more detail on how the extracellular residues of KCNE3 are substituted? Did you use Modeller, SwissModel, or AlphaFold to substitute this region of the KCNEs?

      We used ICM-pro to substitute extracellular residues of KCNE3 and create mutant variants of the Iks channel. This information is provided in the methods section now.

      13) Line 551: The PIP2 density was solved using cryo-EM, not X-ray crystallography.

      We corrected this.

      14) Line 555: The system was equilibrated for ten ns. In which ensemble? Was there any restraint applied during the equilibration run? If yes, at what force constant?

      The system was equilibrated in NVT and NPT ensembles with restraints. These details are added to methods. In the new simulations, we did equilibrations gradually releasing spatial from the backbone, sidechains, lipids, and ligands. A final 30 ns equilibration in the NPT ensemble was performed with restraint only for backbone atoms with a force constant of 50 kJ/mol/nm2. Methods were edited accordingly.

      15) Line 557: Kelvin is a unit without a degree.

      Corrected

      16) Line 559: PME is an electrostatic algorithm, not a method.

      Corrected

      17) Line 566: Collecting 1000 snapshots at which intervals. Given your run are not equal in length, how can you ensure that these are representative snapshots?

      Please see comment #5.

      18) Table 3 - Why SD for computational data and SEM for experimental data?

      There was no particular reason for using SD in some graphs. We used appropriate statistical tests to compare the groups where the difference was not obvious.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      1. Evidence for a disulfide bridge contained in membrane-associated FGF2 dimers

      This aspect was brought up in detail by both Reviewer #1 and Reviewer #3. It has been addressed in the revised manuscript by (i) new experimental and computational analyses, (ii) a more detailed discussion of previous work from our lab in which experiments were done the reviewers were asking for and (iii) a more general discussion of known examples of disulfide formation in protein complexes with a particular focus on membrane surfaces facing the cytoplasm, the inner plasma membrane leaflet being a prominent example. Please find our detailed comments in our direct response to Reviewers #1 and #3, see below.

      1. Affinity towards PI(4,5)P2 comparing FGF2 dimers versus monomers

      This is an aspect that has been raised by Reviewer 3 along with additional comments on the interaction of FGF2 with PI(4,5)P2. Please find our detailed response below. With regard to PI(4,5)P2 affinity aspects of FGF2 dimers versus FGF2 monomers, we think that the increased avidity of FGF2 dimers with two high affinity binding pockets for PI(4,5)P2 are a good explanation for the different values of free energies of binding that were calculated from the atomistic molecular dynamics simulations shown in Fig. 9. This phenomenon is well known for many biomolecular interactions and is also consistent with the cryoEM data contained in our manuscript, showing a FGF2 dimer with two PI(4,5)P2 binding sites facing the membrane surface.

      1. C95-C95 FGF2 dimers as signaling units

      We have put forward this hypothesis since in structural studies analyzing the FGF ternary signaling complex consisting of FGF2, FGF receptor and heparin, FGF2 mutants were used that lack C95. Nevertheless, two FGF2 molecules are contained in FGF signaling complexes. In addition to the papers on the structure of the FGF signaling complex, we have cited work that showed that C95-C95 crosslinked FGF2 dimers are efficient FGF signaling modules (Decker et al, 2016; Nawrocka et al, 2020). Therefore, being based on an assembly/disassembly mechanism with the transient formation of poreforming FGF2 oligomers, we think it is an interesting idea that the FGF2 secretion pathway produces C95-C95 disulfide-linked FGF2 dimers at the outer plasma membrane leaflet that can engage in FGF2 ternary signaling complexes. While this is a possibility we put forward to stimulate the field, it of course remains a hypothesis which has been clearly indicated as such in the revised manuscript.

      Reviewer #1:

      1. Evidence for disulfide-bridged FGF2 dimers and higher oligomers on non-reducing versus reducing SDS gels

      The experiment suggested by Reviewer #1 is an important one that has been published by our group in previous work. In these studies, we found FGF2 oligomers analyzed on non-reducing SDS gels to be sensitive to DTT, turning the vast majority of oligomeric FGF2 species into monomers [(Müller et al, 2015); Fig. 3, compare panel D with panel H]. This phenomenon could be observed most clearly after short periods of incubations (0.5 hours) of FGF2 with PI(4,5)P2-containing liposomes. These findings constituted the original evidence for PI(4,5)P2-induced FGF2 oligomerization to depend on the formation of intermolecular disulfide bridges.

      In the current manuscript, we established the structural principles underlying this process and identified C95 to be the only cysteine residue involved in disulfide formation. Based on biochemical cross-linking experiments in cells, cryo-electron tomography, predictions from AlphaFold-2 Multimer and molecular dynamics simulations, we demonstrated a strong FGF2 dimerization interface in which C95 residues are brought into close proximity when FGF2 is bound to membranes in a PI(4,5)P2-dependent manner. These findings provide the structural basis by which disulfide bridges can be formed from the thiols contained in the side chains of two C95 residues directly facing each other in the dimerization interface. In the revised manuscript, we included additional data that further strengthen this analysis. In the experiments shown in the new Fig. 10, we combined chemical cross-linking with mass spectrometry, further validating the reported FGF2 dimerization interface. In addition, illustrated in the new Fig. 8, we employed a new computational analysis combining 360 individual atomistic molecular dynamics simulations, each spanning 0.5 microseconds, with advanced machine learning techniques. This new data set corroborates our findings, demonstrating that the C95-C95 interface self-assembles independently of C95-C95 disulfide formation, based on electrostatic interactions. Intriguingly, it is consistent with our experimental findings based on cross-linking mass spectrometry (new Fig. 10) where cross-linked peptides could also be observed with the C77/95A variant form of FGF2, suggesting a protein-protein interface whose formation does not depend on disulfide formation. Therefore, we propose that disulfide formation occurs in a subsequent step, representing the committed step of FGF2 membrane translocation with the formation of disulfide-bridged FGF2 dimers being the building blocks for pore-forming FGF2 oligomers.

      As a more general remark on the mechanistic principles of disulfide formation in different cellular environments, we would like to emphasize that it is a common misconception that the reducing environment of the cytoplasm generally makes the formation of disulfide bridges unlikely or even impossible. From a biochemical point of view, the formation of disulfide bridges is not limited by a reducing cellular environment but is rather controlled by kinetic parameters when two thiols are brought into proximity. Indeed, it has become well established that disulfide bridges can also be formed in compartments other than the lumen of the ER/Golgi system, including the cytoplasm. For example, viruses maturing in the cytoplasm can form stable structural disulfide bonds in their coat proteins (Locker & Griffiths, 1999; Hakim & Fass, 2010). Moreover, many cytosolic proteins, including phosphatases, kinases and transcriptions factors, are now recognized to be regulated by thiol oxidation and disulfide bond formation, formed as a post-transcriptional modification (Lennicke & Cocheme, 2021). In numerous cases with direct relevance for our studies on FGF2, disulfide bond formation and other forms of thiol oxidation occur in association with membrane surfaces. In fact, many of these processes are linked to the inner plasma membrane leaflet (Nordzieke & Medrano-Fernandez, 2018). Growth factors, hormones and antigen receptors are observed to activate transmembrane NADPH oxidases generating O2·-/H2O2 (Brown & Griendling, 2009). For example, the local and transient oxidative inactivation of membrane-associated phosphatases (e.g., PTEN) serves to enhance receptor associated kinase signaling (Netto & Machado, 2022). It is therefore conceivable that similar processes introduce disulfide bridges into FGF2 while assembling into oligomers at the inner plasma membrane leaflet. In the revised version of our manuscript, we have discussed the above-mentioned aspects in more detail, with the known role of NADPH oxidases in disulfide formation at the inner plasma membrane leaflet being highlighted.

      Reviewer #2:

      1. Potential effects of a C95A substitution on protein folding and comparison with a C95S substitution with regard to phenotypes observed in FGF2 secretion

      A valid point that we indeed addressed at the beginning of this project. Most importantly, we tested whether both FGF2 C95A and FGF2 C95S are characterized by severe phenotypes in FGF2 secretion efficiency. As shown in the revised Fig. 1, cysteine substitutions by serine showed very similar FGF2 secretion phenotypes compared to cysteine to alanine substitutions (Fig. 1C and 1D). In addition, in the pilot phase of this project, we also compared recombinant forms of FGF2 C95A and FGF2 C95S in various in vitro assays. For example, we tested the full set of FGF2 variants in membrane integrity assays as the ones contained in Fig. 4. As shown in Author response image 1, FGF2 variant forms carrying a serine in position 95 behaved in a very similar manner as compared to FGF2 C95A variant forms. Relative to FGF2 wild-type, membrane pore formation was strongly reduced for both types of C95 substitutions. By contrast, both FGF2 C77S and C77A did show activities that were similar to FGF2 wild-type.

      Author response image 1.

      From these experiments, we conclude that changes in protein structure are not the basis for the phenotypes we report on the C95A substitution in FGF2.

      1. Effects of a C77A substitution on FGF2 membrane recruitment in cells

      The effect of a C77A substitution in FGF2 recruitment to the inner plasma membrane leaflet is indeed a moderate one. This is likely to be the case because C77 is only one residue of a more complex surface that contacts the α1 subunit of the Na,K-ATPase. Stronger effects can be observed when K54 and K60 are changed, residues that are positioned in close proximity to C77 (Legrand et al, 2020). Nevertheless, as shown in the revised Fig. 1, we consistently observed a reduction in membrane recruitment when comparing FGF2 C77A with FGF2 wild-type. When analyzing the raw data without GFP background subtraction, a significant reduction of FGF2 C77A was observed compared to FGF2 wild-type (Fig. 1A and 1B). We therefore conclude that C77 does not only play a role in FGF2/α1 interactions in biochemical assays using purified components (Fig. 7) but also impairs FGF2/α1 interactions in a cellular context (Fig. 1A and 1B).

      1. Identity of the protein band in Fig. 3 labeled with an empty diamond

      This is a misunderstanding as we did not assign this band to a FGF2-GFP dimer. When we produced the corresponding cell lines, we used constructs that link FGF2 with GFP via a ‘self-cleaving’ P2A sequence. During translation, even though arranged on one mRNA, this causes the production of FGF2 and GFP as separate proteins in stoichiometric amounts, the latter being used to monitor transfection efficiency. However, a small fraction is always expressed as a complete FGF2-P2A-GFP fusion protein (a monomer). This band can be detected with the FGF2 antibodies used and was labeled in Fig. 3 by an empty diamond.

      1. Labeling of subpanels in Fig. 5A

      We have revised Fig. 5 according to the suggestion of Reviewer #2.

      1. FGF2 membrane binding efficiencies shown in Fig. 5C

      It is true that FGF2 variant forms defective in PI(4,5)P2-dependent oligomerization (C95A and C77/95A) bind to membranes with somewhat reduced efficiencies. This is also evident form the intensity profiles shown in Fig. 5A and was observed in biochemical in vitro experiments as well. A plausible explanation for this phenomenon would be the increased avidity when FGF2 oligomerizes, stabilizing membrane interactions (see also Fig. 9B).

      1. Residual activities of FGF2 C95A and C77/95A in membrane pore formation?

      We do not assign the phenomenon in Fig. 5 Reviewer #2 is referring to as controlled activities of FGF2 C95A and C77/95A in membrane pore formation. Rather, GUVs containing PI(4,5)P2 are relatively labile structures with a certain level of integrity issues upon protein binding and extended incubation times being conceivable. It is basically a technical limitation of this assay with GUVs incubated with proteins for 2 hours. Even after substitution of PI(4,5)P2 with a Ni-NTA membrane lipid, background levels of loss of membrane integrity can be observed (Fig. 6). Therefore, as compared to FGF2 C95A and C77/95A, the critical point here is that FGF2 wt and FGF2 C77A do display significantly higher levels of a loss of membrane integrity in PI(4,5)P2-containing GUVs, a phenomenon that we interpret as controlled membrane pore formation. By contrast, all variant forms of FGF2 show only background levels for loss of membrane integrity in GUVs containing the Ni-NTA lipid.

      1. Why does PI(4,5)P2 induce FGF2 dimerization?

      This has been studied extensively in previous work (Steringer et al, 2017). As also discussed in the current manuscript, the interaction of FGF2 with membranes through its high affinity PI(4,5)P2 binding pocket orients FGF2 molecules on a 2D surface that increase the likelihood of the formation of the C95containing FGF2 dimerization interface. Moreover, in the presence of cholesterol at levels typical for plasma membranes, PI(4,5)P2 clusters containing up to 4 PI(4,5)P2 molecules (Lolicato et al, 2022), a process that may further facilitate FGF2 dimerization.

      1. Is it possible to pinpoint the number of FGF2 subunits in oligomers observed in cryo-electron tomography?

      We indeed took advantage of the Halo tags that appear as dark globular structures in cryo-electron tomography. For most FGF2 oligomers with FGF2 subunits on both sides of the membrane, we could observe 4 to 6 Halo tags which is consistent with the functional subunit number that has been analyzed for membrane pore formation (Steringer et al., 2017; Sachl et al, 2020; Singh et al, 2023). However, since the number of higher FGF2 oligomers we observed in cryo-electron tomography was relatively small and the nature of these oligomers appears to be highly dynamic, caution should be taken to avoid overinterpretation of the available data.

      Reviewer #3:

      1. Conclusive demonstration of disulfide-linked FGF2 dimers

      A similar point was raised by Reviewer #1, so that we would like to refer to our response on page 2, see above.

      1. Identity of FGF2-P2A-GFP observed in Fig. 3

      Again, a similar point has been made, in this case by Reviewer #2 (Point 3). The observed band is not a FGF2-P2A-GFP dimer but rather the complete FGF2-P2A-GFP fusion protein (a monomer) that corresponds to a small population produced during mRNA translation where the P2A sequence did not cause the production of FGF2 and GFP as separate proteins in stoichiometric amounts.

      1. Quantification of GFP signals in Fig. 6

      Fig. 6 has been revised according to the suggestion of Reviewer #3. A comprehensive comparison of PI(4,5)P2 and the Ni-NTA membrane lipid in FGF2 membrane translocation assays is also contained in previous work that introduced the GUV-based FGF2 membrane translocation assay (Steringer et al., 2017).

      1. Experimental evidence for various aspects of FGF2 interactions with PI(4,5)P2

      Most of the points raised by Reviewer #3 have been addressed in previous work. For example, FGF2 has been demonstrated to dimerize only on membrane surfaces containing PI(4,5)P2 (Müller et al., 2015). In solution, FGF2 remained a monomer even after hours of incubation as analyzed by native gel electrophoresis and reducing vs. non-reducing SDS gels (see Fig. 3 in Müller et al, 2015). In the same paper, the first evidence for a potential role of C95 in FGF2 oligomerization has been reported, however, at the time, our studies were limited to FGF2 C77/95A. In the current manuscript, the in vitro experiments shown in Figs. 2 to 6 establish the unique role of C95 in PI(4,5)P2-dependent FGF2 oligomerization. As discussed above, FGF2 oligomers have been shown to contain disulfide bridges based on analyses on non-reducing gels in the absence and presence of DTT (Müller et al., 2015).

      References

      Brown DI, Griendling KK (2009) Nox proteins in signal transduction. Free Radic Biol Med 47: 1239-1253 Decker CG, Wang Y, Paluck SJ, Shen L, Loo JA, Levine AJ, Miller LS, Maynard HD (2016) Fibroblast growth factor 2 dimer with superagonist in vitro activity improves granulation tissue formation during wound healing. Biomaterials 81: 157-168

      Hakim M, Fass D (2010) Cytosolic disulfide bond formation in cells infected with large nucleocytoplasmic DNA viruses. Antioxid Redox Signal 13: 1261-1271

      Legrand C, Saleppico R, Sticht J, Lolicato F, Muller HM, Wegehingel S, Dimou E, Steringer JP, Ewers H, Vattulainen I et al (2020) The Na,K-ATPase acts upstream of phosphoinositide PI(4,5)P2 facilitating unconventional secretion of Fibroblast Growth Factor 2. Commun Biol 3: 141

      Lennicke C, Cocheme HM (2021) Redox metabolism: ROS as specific molecular regulators of cell signaling and function. Mol Cell 81: 3691-3707

      Locker JK, Griffiths G (1999) An unconventional role for cytoplasmic disulfide bonds in vaccinia virus proteins. J Cell Biol 144: 267-279

      Lolicato F, Saleppico R, Griffo A, Meyer A, Scollo F, Pokrandt B, Muller HM, Ewers H, Hahl H, Fleury JB et al (2022) Cholesterol promotes clustering of PI(4,5)P2 driving unconventional secretion of FGF2. J Cell Biol 221

      Müller HM, Steringer JP, Wegehingel S, Bleicken S, Munster M, Dimou E, Unger S, Weidmann G, Andreas H, GarciaSaez AJ et al (2015) Formation of Disulfide Bridges Drives Oligomerization, Membrane Pore Formation and Translocation of Fibroblast Growth Factor 2 to Cell Surfaces. J Biol Chem 290: 8925-8937

      Nawrocka D, Krzyscik MA, Opalinski L, Zakrzewska M, Otlewski J (2020) Stable Fibroblast Growth Factor 2 Dimers with High Pro-Survival and Mitogenic Potential. Int J Mol Sci 21

      Netto LES, Machado L (2022) Preferential redox regulation of cysteine-based protein tyrosine phosphatases: structural and biochemical diversity. FEBS J 289: 5480-5504

      Nordzieke DE, Medrano-Fernandez I (2018) The Plasma Membrane: A Platform for Intra- and Intercellular Redox Signaling. Antioxidants (Basel) 7

      Sachl R, Cujova S, Singh V, Riegerova P, Kapusta P, Muller HM, Steringer JP, Hof M, Nickel W (2020) Functional Assay to Correlate Protein Oligomerization States with Membrane Pore Formation. Anal Chem 92: 14861-14866

      Singh V, Macharova S, Riegerova P, Steringer JP, Muller HM, Lolicato F, Nickel W, Hof M, Sachl R (2023) Determining the Functional Oligomeric State of Membrane-Associated Protein Oligomers Forming Membrane Pores on Giant Lipid Vesicles. Anal Chem 95: 8807-8815

      Steringer JP, Lange S, Cujova S, Sachl R, Poojari C, Lolicato F, Beutel O, Muller HM, Unger S, Coskun U et al (2017) Key steps in unconventional secretion of fibroblast growth factor 2 reconstituted with purified components. eLife 6: e28985

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The paper from Hsu and co-workers describes a new automated method for analyzing the cell wall peptidoglycan composition of bacteria using liquid chromatography and mass spectrometry (LC/MS) combined with newly developed analysis software. The work has great potential for determining the composition of bacterial cell walls from diverse bacteria in high-throughput, allowing new connections between cell wall structure and other important biological functions like cell morphology or host-microbe interactions to be discovered. In general, I find the paper to be well written and the methodology described to be useful for the field. However, there are areas where the details of the workflow could be clarified. I also think the claims connecting cell wall structure and stiffness of the cell surface are relatively weak. The text for this topic would benefit from a more thorough discussion of the weak points of the argument and a toning down of the conclusions drawn to make them more realistic.

      Thank you for your thorough and insightful review of our manuscript. We greatly appreciate your positive and constructive feedbacks on our methodology. We have carefully reviewed your comments and have responded to each point as follows:

      Specific points:

      1) It was unclear to me from reading the paper whether or not prior knowledge of the peptidoglycan structure of an organism is required to build the "DBuilder" database for muropeptides. Based on the text as written, I was left wondering whether bacterial samples of unknown cell wall composition could be analyzed with the methods described, or whether some preliminary characterization of the composition is needed before the high-throughput analysis can be performed. The paper would be significantly improved if this point were explicitly addressed in the main text. We apologize for not making it clearer. The prior knowledge of the peptidoglycan structure of an organism is indeed required to build the “DBuilder” database to accurately identify muropeptides; otherwise, the false discovery rate might increase. While peptidoglycan structures of certain organisms might not have been extensively studied, users still remain the flexibility to adapt the muropeptide compositions based on their study, referencing closely related species for database construction. We have addressed this aspect in the main text to ensure a clearer understanding.

      “(Section HAMA platform: a High-throughput Automated Muropeptide Analysis for Identification of PGN Fragments) …(i) DBuilder... Based on their known (or putative) PGN structures, all possible combinations of GlcNAc, MurNAc and peptide were input into DBuilder to generate a comprehensive database that contains monomeric, dimeric, and trimeric muropeptides (Figure 1b)."

      2) The potential connection between the structure of different cell walls from bifidobacteria and cell stiffness is pretty weak. The cells analyzed are from different strains such that there are many possible reasons for the change in physical measurements made by AFM. I think this point needs to be explicitly addressed in the main text. Given the many possible explanations for the observed measurement differences (lines 445-448, for example), the authors could remove this portion of the paper entirely. Conclusions relating cell wall composition to stiffness would be best drawn from a single strain of bacteria genetically modified to have an altered content of 3-3 crosslinks.

      We understand your concern regarding the weak connection between cell wall structure and cell stiffness. We will make a clear and explicit statement in the main text to acknowledge that the cells analyzed are derived from different strains, introducing the possibility of various factors influencing the observed changes in physical measurements as determined by AFM. Furthermore, we greatly appreciate your suggestion to consider genetically modified strains to investigate the role of cross-bridge length in determining cell envelope stiffness. In this regard, we are in the process of developing a CRISPR/Cas genome editing toolbox for Bifidobacterium longum, and we plan on this avenue of investigation for future work.

      Reviewer #2 (Public Review):

      The authors introduce "HAMA", a new automated pipeline for architectural analysis of the bacterial cell wall. Using MS/MS fragmentation and a computational pipeline, they validate the approach using well-characterized model organisms and then apply the platform to elucidate the PG architecture of several members of the human gut microbiota. They discover differences in the length of peptide crossbridges between two species of the genus Bifidobacterium and then show that these species also differ in cell envelope stiffness, resulting in the conclusion that crossbridge length determines stiffness.

      We appreciate your thoughtful review of our manuscript and your recognition of the potential significance of our work in elucidating the poorly characterized peptidoglycan (PGN) architecture of the human gut microbiota.

      The pipeline is solid and revealing the poorly characterized PG architecture of the human gut microbiota is worthwhile and significant. However, it is unclear if or how their pipeline is superior to other existing techniques - PG architecture analysis is routinely done by many other labs; the only difference here seems to be that the authors chose gut microbes to interrogate.

      We apologize if this could have been clearer. The HAMA platform stands apart from other pipelines by utilizing automatic analysis of LC-MS/MS data to identify muropeptides. In contrast, most of the routine PGN architecture analyses often use LC-UV/Vis or LC-MS platform, where only the automatic analyzing PGFinder software is supported. To our best knowledge, a comparable pipeline on automatically analyzing LC-MS/MS data was reported by Bern et al., which they used commercial Byonic software with an in-house FASTA database and specific glycan modifications. They achieved accurate and sensitive identification on monomer muropeptides, but struggled with cross-linked muropeptides due to the limitations of the Byonic software. We believe that our pipeline introducing the automatic and comprehensive analysis on muropeptide identification (particularly for Gram-positive bacterial peptidoglycans) would be a valuable addition to the field. To enhance clarity, we have adjusted the context as follows:

      (Introduction) … Although they both demonstrated great success in identifying muropeptide monomers, the accurate identification of muropeptide multimers and other various bacterial PGN structures still remains unresolved. This is because deciphering the compositions requires MS/MS fragmentation, but it is still challenging to automatically annotate MS/MS spectra from these complex muropeptide structures."

      I do not agree with their conclusions about the correlation between crossbridge length and cell envelope stiffness. These experiments are done on two different species of bacteria and their experimental setup therefore does not allow them to isolate crossbridge length as the only differential property that can influence stiffness. These two species likely also differ in other ways that could modulate stiffness, e.g. turgor pressure, overall PG architecture (not just crossbridge length), membrane properties, teichoic acid composition etc.

      Regarding the conclusions drawn about the correlation between cross-bridge length and cell envelope stiffness, we understand your point and appreciate your feedback. We revisit this section of our manuscript and tone down the conclusions drawn from this aspect of the study. We also recognize the importance of considering other potential factors that could influence stiffness, as you mentioned above. In light of this, we mentioned the need for further investigations, potentially involving genetically modified strains, in the main text to isolate and accurately determine the impact of bridge length on cell envelope stiffness.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      1) One thing to consider would be testing the robustness of the analysis pipeline with one the well-characterized bacteria studied, but genetically modifying them to change the cell wall composition in predictable ways. Does the analysis pipeline detect the expected changes?

      We appreciate the reviewer's suggestion and would like to provide a clear response. Regarding to testing the pipeline with genetically modified strains, our lab previously worked on genetically modified S. maltophilia (KJΔmrdA).1 Inactivation of mrdA turned out the increasing level of N-acetylglucosaminyl-1,6-anhydro-N-acetylmuramyl-L-alanyl-D-glutamyl-meso-diamnopimelic acid-D-alanine (GlcNAc-anhMurNAc tetrapeptide) in muropeptide profiles, which is the critical activator ligands for mutant strain ΔmrdA-mediated β-lactamase expression. In this case, our platform could provide rapid PGN analysis for verifying the expected change of muropeptide profiles (see Author response image 1). Besides, if the predictable changes involve genetically modifications on interpeptide bridges within the PGN structure, for example, the femA/B genes of S. aureus, which are encoded for the synthesis of interpeptide bridges,2 our current HAMA pipeline is capable of detecting these anticipated changes. However, if the genetically modifications involve the introduce of novel components to PGN structures, then it would need to create a dedicated database specific to the genetically modified strain.

      Author response image 1.

      2) Line 368: products catalyzed > products formed

      The sentence has been revised.

      “(Section Inferring PGN Cross-linking Types Based on Identified PGN Fragments) …Based on the muropeptide compositional analysis mentioned above, we found high abundances of M3/M3b monomer and D34 dimer in the PGNs of E. faecalis, E. faecium, L. acidophilus, B. breve, B. longum, and A. muciniphila, which may be the PGN products formed by Ldts.”

      3) Lines 400-402: Is it possible the effect is related to porosity, not "hardness".

      Thank you for the suggestion. The possibility of the slower hydrolysis rate of purified PGN in B. breve being related to porosity is indeed noteworthy. While this could be a potential factor, we would like to acknowledge the limited existing literature that directly addresses the relation between PGN architecture and porosity. It is plausible that current methods available for assessing cell wall porosity may have certain limitations, contributing to the scarcity of relevant studies. In light of this, we would like to propose a speculative explanation for the observed effect. It is plausible that the tighter PGN architecture resulting from shorter interpeptide bridges in B. breve could contribute to its harder texture. This speculation is grounded in the concept that a more compact PGN structure might lead to increased stiffness, aligning with our observations of higher cell stiffness in B. breve.

      4) Lines 403-408: See point #2 above.

      Thank you for the suggestion. We have explicitly addressed this point in the main text:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) … Taken all together, we speculate that a tight peptidoglycan network woven by shorter interpeptide bridges or 3-3 cross-linkages could give bacteria stiffer cell walls. However, it is important to note that cell stiffness is a mechanical property that also depends on PGN thickness, overall architecture, and turgor pressure. These parameters may vary among different bacterial strains. Hence, carefully controlled, genetically engineered strains with similar characteristics will be needed to dissect the role of cross-bridge length in cell envelope stiffness.”

      5) Lines 428-429: It is not clear to me how mapping the cell wall architecture provides structural information about the synthetic system. It is also not clear how antibiotic resistance can be inferred. More detail is needed here to flesh out these points.

      Thank you for the suggestion. To provide further clarity on these important aspects, the context in the manuscript has been revised.

      “(Discussion) …Importantly, our HAMA platform provides a powerful tool for mapping peptidoglycan architecture, giving structural information on the PGN biosynthesis system. This involves the ability to infer possible PGN cross-linkages based on the type of PGN fragments obtained from hydrolysis. For instance, the identification of 3-3 cross-linkage formed by L,D-transpeptidases (Ldts) is of particular significance. Unlike 4-3 cross-linkages, the 3-3 cross-linkage is resistant to inhibition by β-Lactam antibiotics, a class of antibiotics that commonly targets bacterial cell wall synthesis through interference with 4-3 cross-linkages. Therefore, by elucidating the specific cross-linkage types within the peptidoglycan architecture, our approach offers insights into antibiotic resistance mechanisms.”

      6) Line 478: "maneuvers are proposed for" > "work is needed to generate". Also, delete "innovative". Also "in silico" > "in silico-based".

      The sentence has been revised.

      “(Discussion) …To achieve a more comprehensive identification of muropeptides, future work is needed to generate an expanded database, in silico-based fragmentation patterns, and improved MS/MS spectra acquisition.”

      7) Line 485: "Its" > "It has potential"

      The sentence has been revised.

      “(Discussion) …It has potential applications in identifying activation ligands for antimicrobial resistance studies, characterizing key motifs recognized by pattern recognition receptors for host-microbiota immuno-interaction research, and mapping peptidoglycan in cell wall architecture studies.”

      8) Figure 1 legend: Define Gb and Pb.

      Gb and Pb are the abbreviations of glycosidic bonds and peptide bonds. We have revised the Figure legend 1 as follow:

      “(Figure legend 1) …(b) DBuilder constructs a muropeptide database containing monomers, dimers, and trimers with two types of linkage: glycosidic bonds (Gb) and peptide bonds (Pb).”

      9) Figure 2: It is hard to see what is going on in panel a and c with all the labels. Consider removing them and showing a zoomed inset with labels in addition to ab unlabeled full chromatogram.

      We apologize for not making this clearer. The panel a and c in Figure 2 were directly generated by the Analyzer as a software screenshot of the peak annotations on chromatogram. Our intention was to present a comprehensive PGN mapping (approximately 70% of the peak area was assigned to muropeptide signals) using this platform. We understand the label density might affect clarity, so we have added the output tables of the whole muropeptide identifications as source data (Table 1–Source Data 1&2). Additionally, we have uploaded the Analyzer output files (see Additional Files), which can be better visualized in the Viewer program, and it also allows users zoom in for detailed labeling information.

      10) Figure 3: It is worth pointing out what features of the MS/MS fingerprints are helping to discriminate between species.

      Thank you for the suggestion. We have revised Figure 3 and the legend as follow:

      “(Figure legend 3) …The sequence of each isomer was determined using in silico MS/MS fragmentation matching, with the identified sequence having the highest matching score. The key MS/MS fragments that discriminate between two isomers are labeled in bold brown.”

      Author response image 2.

      11) Figure 4 and 5 legend: Can you condense the long descriptions of the abbreviations - or at least only refer to them once?

      Certainly, to enhance clarity and conciseness in the figure legends, we have revised Figure legend 5 as follow:

      “(Figure legend 5) …(b) Heatmap displaying …. Symbols: M, monomer; D, dimer; T, trimer (numbers indicate amino acids in stem peptides). Description of symbol abbreviations as in Figure legend 4, with the addition of "Glycan-T" representing trimers linked by glycosidic bonds.”

      Reviewer #2 (Recommendations For The Authors):

      1. Please read the manuscript carefully for spelling errors.

      We appreciate your careful review of our manuscript. We have thoroughly rechecked the entire manuscript for spelling errors and have made the necessary corrections to ensure the accuracy and quality of the text.

      1. Line 46 - "multilayered" is likely only true for Gram-positive bacteria.

      We thank reviewer #2 for bringing up this concern. Indeed, Gram-negative bacteria mostly possess single layer of peptidoglycan, but could be up to three layers in some part of the cell surface.3, 4 In order to reduce the confusion, we have rewritten the context as follow: “(Introduction) …PGN is a net-like polymeric structure composed of various muropeptide molecules, with their glycans linearly conjugated and short peptide chains cross-linked through transpeptidation.”

      1. Methods section: It seems like pellets from a 10 mL bacterial culture were ultimately suspended in 1.5 L (750 mL water + 750 mL tris) - why such a large volume? And how were PG fragments subsequently washed (centrifugation? There is no information on this in the Methods).

      We apologize for the mislabeling on the units. The accurate volume should be “1.5 mL (750 µL water + 750 µL tris)”. We have updated the correct volume in the Methods section (lines 99-100). For the washing process of purified PGN, we added 1 mL water, centrifuged at 10,000 rpm for 5 minutes, and removed supernatant. This information has added to the Methods section (lines 95-98).

      1. Line 183 - why were 6 modifications chose as the cutoff? Please make rationale more clear.

      We thank reviewer #2 for the comments. We set the maximum modification number of 6 in the assumption of one modification on each sugar of a trimeric muropeptide. A lower cutoff could effectively limit the identification of muropeptides with unlikely numbers of modifications, whereas a higher cutoff could allow for having multiple modifications on a muropeptide. In our hand, muropeptide modifications of E. coli are mostly N-deacetyl-MurNAc and anhydro-MurNAc, and modifications of gut microbes used here are mostly N-deacetyl-GlcNAc, anhydro-MurNAc, O-acetyl-MurNAc, loss of GlcNAc, and amidated iso-Glu. While we recommend starting data analysis with the cutoff of 6 modifications, users are free to adjust this based on their studies.

      1. Line 339 - define donor vs. acceptor here (can be added in parentheses after explaining the relevant chemical reactions further above in the text)

      Thank you for the suggestion. To provide greater clarity regarding the roles of the donor and acceptor substrates in the transpeptidation process, we have revised the content in the manuscript as follows:

      “(Section Inferring PGN Cross-linking Types Based on Identified PGN Fragments) …In general, there are two types of PGN cross-linkage…. Transpeptidation involves two stem peptides which function as acyl donor and acceptor substrates, respectively. As the enzyme names imply, the donor substrates that Ddts and Ldts bind to are terminated as D,D-stereocenters and L,D-stereocenters, which structurally means pentapeptides and tetrapeptides. During D,D-transpeptidation, Ddts recognize D-Ala4-D-Ala5 of the donor stem (pentapeptide) and remove the terminal D-Ala5 residue, forming an intermediate. The intermediate then cross-links the NH2 group in the third position of the neighboring acceptor stem, forming a 4-3 cross-link.”

      1. Line 366 following - can you calculate % crosslinks based on these numbers? What does "high abundance" of 3,3 crosslinks mean in this context? Is this the majority of PG?

      Thank you for your questions. Calculating the percentage of crosslinks based on the muropeptide compositional numbers is a valid consideration. However, it's important to note that the muropeptides we analyzed were hydrolyzed by mutanolysin, and as such, deriving an accurate % crosslink value from these data might not provide a true representation of the crosslinking percentage within the PGN network. For a more precise determination of % crosslinks, methods such as solid-phase NMR on purified peptidoglycan would be required. Our research provides insights into the characterization of PGN fragments and allows us to infer potential PGN cross-linkage types and the enzymes involved based on the dominant muropeptide fragments. Regarding the phrase "high abundance" in the context, it indicates that the M3b/M4b monomer and D34 dimer muropeptides represent a significant portion of the hydrolysis products. These muropeptides are major constituents within the PGN fragments obtained from the enzymatic hydrolysis.

      1. Line 375 - I am not sure PG is a meaningful diffusion barrier for drugs and signaling molecules, give that even larger proteins can apparently diffuse through the pores.

      Thank you for raising this point. Peptidoglycan indeed possesses relatively wide pores that allow for the diffusion of larger molecules, including proteins.5 Research has provided a rough estimate of the porosity of the PGN meshwork, suggesting that it allows for the diffusion of proteins with a maximum molecular mass of around 50 kDa.6 Considering this, we acknowledge that PGN may not serve as a significant diffusion barrier for drugs and signaling molecules. The porosity of the PGN scaffold, which is defined by the degree of cross-linking, plays a role in influencing the transport of molecules to the cell membrane. Thus, while PGN may not serve as a strict diffusion barrier, its structural characteristics still impact bacterial cell mechanics and interactions. We have revised the manuscript to reflect this understanding:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) …The porosity of the PGN scaffold, defined by the degree of cross-linking, influences the transport of larger molecules such as proteins. Therefore, modifications to PGN structure are anticipated to significantly affect bacterial cell mechanics and interactions.”

      1. Line 400 - what does "slower hydrolysis rate" refer to, is this chemical hydrolysis or enzymatic (autolysins?). also, I am not sure hydrolysis rate of either modality allows for solid conclusions about how hard (line 402) the PG is.

      Thank you for your comments. The hydrolysis rate here refers to the enzymatic hydrolysis, specifically the mutanolysin cleaving the β-N-acetylmuramyl-(1,4)-N-acetylglucosamine linkage. Indeed, there is no direct correlation between the hydrolysis rate and the hardness of PGN architecture, although the structure rigidity is a key determinant in protein digestion.7 Considering the enzymatic hydrolysis rate depending on the accessibility of the substrate to the enzyme, we proposed that the tighter PGN architecture could also lead to a slower hydrolysis rate. This speculation aligns with our observations of higher cell stiffness or more compact PGN structure of B. breve and its slower hydrolysis rate. We understand this is indirect proof, so the revised sentence now reads:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) …Furthermore, B. breve also showed a slower enzymatic hydrolysis rate in purified PGNs, implying that the cell wall structure of B. breve is characterized by a compact PGN architecture.”

      1. Line 424 - I am not convinced this pipeline can detect PG architectures that other pipelines cannot; likely, the difference between previous analyses and theirs is due to different growth conditions (3,3 crosslink formation is often modulated by environmental factors/growth stage). In the next sentence, it sounds like mutanolysin treatment is a novelty in PG analysis (which it is not).

      We apologize if this could have been clearer and we have revised the paragraph to describe our study more accurately. We agree that different growth conditions could influence PGN architecture and other pipelines could manually identify the PGN architectures or automatically identify them if they are not too complex. Our original intention was to highlight the ability of the HAMA program to automatically identify unreported PGN structure. Here are the revised sentences:

      “(Discussion) …We speculate that this finding may be influenced by the comprehensive mass spectrometric approaches we employed or by variations in growth conditions. Moreover, we utilized the well-established enzymatic method involving mutanolysin to cleave the β-N-acetylmuramyl-(1,4)-N-acetylglucosamine linkage, which preserves the original peptide linkage in intact PGN subunits.”

      1. Line 440- 442: As outlined in more detail above: I don't think you can conclude something about the relationship between bridge length and envelope stiffness based on these data. Thank you for your valuable feedback. We agree that our data may not definitively support the direct conclusion about the relationship between bridge length and envelope stiffness in Bifidobacterium species. Instead, we will rephrase this section to accurately present the observed correlations without overgeneralizing:

      “(Discussion) … Notably, our study suggested a potential correlation between the cell stiffness and the compactness of bacterial cell walls in Bifidobacterium species (Figure 5). B. longum, which predominantly harbors tetrapeptide bridges (Ser-Ala-Thr-Ala), exhibits a trend towards lower stiffness, whereas B. breve, characterized by PGN cross-linked with monopeptide bridges (Gly), demonstrates a trend towards higher stiffness. These findings suggested that it may be correlated between the increased rigidity and the more compact PGN architecture built by shorter cross-linked bridges.”

      References: 1. Huang, Y.-W.; Wang, Y.; Lin, Y.; Lin, C.; Lin, Y.-T.; Hsu, C.-C.; Yang, T.-C., Impacts of Penicillin Binding Protein 2 Inactivation on β-Lactamase Expression and Muropeptide Profile in Stenotrophomonas maltophilia. mSystems 2017, 2 (4), 00077-00017.

      1. Jarick, M.; Bertsche, U.; Stahl, M.; Schultz, D.; Methling, K.; Lalk, M.; Stigloher, C.; Steger, M.; Schlosser, A.; Ohlsen, K., The serine/threonine kinase Stk and the phosphatase Stp regulate cell wall synthesis in Staphylococcus aureus. Sci. Rep. 2018, 8 (1), 13693.

      2. Labischinski, H.; Goodell, E. W.; Goodell, A.; Hochberg, M. L., Direct proof of a "more-than-single-layered" peptidoglycan architecture of Escherichia coli W7: a neutron small-angle scattering study. J. Bacteriol. 1991, 173 (2), 751-756.

      3. Rohde, M., The Gram-Positive Bacterial Cell Wall. Microbiol. Spectr. 2019, 7 (3), gpp3-0044-2018.

      4. Vollmer, W.; Höltje, J. V., The architecture of the murein (peptidoglycan) in gram-negative bacteria: vertical scaffold or horizontal layer(s)? J. Bacteriol. 2004, 186 (18), 5978-5987.

      5. Vollmer, W.; Blanot, D.; De Pedro, M. A., Peptidoglycan structure and architecture. FEMS Microbiol. Rev. 2008, 32 (2), 149-167.

      6. Li, Q.; Zhao, D.; Liu, H.; Zhang, M.; Jiang, S.; Xu, X.; Zhou, G.; Li, C., "Rigid" structure is a key determinant for the low digestibility of myoglobin. Food Chem.: X 2020, 7, 100094.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to the Referee Comments We would like to express our appreciation to the editor and the reviewers for their thoughtful comments and constructive suggestions on the manuscript. We agree with most of the comments and have carefully revised the manuscript accordingly. The revisions are highlighted in red font in the revised manuscript. Below are point-by-point responses to the referee’s comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Microglia are increasingly recognized as playing an important role in shaping the synaptic circuit and regulating neural dynamics in response to changes in their surrounding environment and in brain states. While numerous studies have suggested that microglia contribute to sleep regulation and are modulated by sleep, there has been little direct evidence that the morphological dynamics of microglia are modulated by the sleep/wake cycle. In this work, Gu et al. applied a recently developed miniature two-photon microscope in conjunction with EEG and EMG recording to monitor microglia surveillance in freely-moving mice over extended period of time. They found that microglia surveillance depends on the brain state in the sleep/wake cycle (wake, non-REM, or REM sleep). Furthermore, they subjected the mouse to acute sleep deprivation, and found that microglia gradually assume an active state in response. Finally, they showed that the state-dependent morphological changes depend on norepinephrine (NE), as chemically ablating noradrenergic inputs from locus coeruleus abolished such changes; this is in agreement with previous publications. The authors also showed that the effect of NE is partially mediated by β2-adrenergic receptors, as shown with β2-adrenergic receptor knock-out mice. Overall, this study is a technical tour de force, and its data add valuable direct evidence to the ongoing investigations of microglial morphological dynamics and its relationship with sleep. However, there are a number of details that need to be clarified, and some conclusions need to be corroborated by more control experiments or more rigorous statistical analysis. Specifically:

      1. The number of branch points per microglia shown here (e.g., Fig. 2g) is much lower than the values of branch points in the literature, e.g., Liu T et al., Neurobiol. Stress 15: 100342, 2021 (mouse dmPFC, IHC); Liu YU et al., Nat. Neurosci. 22: 1771-81, 2019 (mouse S1, in vivo 2P imaging). The authors need to discuss the possible source of such discrepancy.

      Thank you for raising this important point. Two reasons may account for this difference. Firstly, the difference in the definition of branch points in the software. Liu YU et al. used the Sholl analysis of image J software to analyze the number of branch points of microglia. Sholl analysis defines the number of branch points as the number of crossings between branches and concentric circles of increasing radii. We reconstructed microglia morphology using Imaris, a software that defines branching points based on the number of bifurcation points. The number of bifurcations calculated represents the number of microglia branch points. Secondly, this and previous studies found that more branching points present in the state of anesthesia. The morphological characteristics of microglia in head-fixed mice under anesthesia was reported by Liu T et al. and the microglia reconstruction results presented by the authors are indeed more complex than ours. In short, this is an aspect that we have been paying attention to, and the main reasons for this difference may lie in the definition of branch points, analysis methods and related choice of thresholds. True differences in brain states and the heterogeneity of microglia in different brain regions may also contribute to the apparent discrepancy.

      1. Microglia process end-point speed (Fig. 2h, o): here the authors show that the speed is highest in the wake state and lowest in NREM, which agrees with the measurement on microglia motility during wakefulness vs NREM in a recent publication (Hristovska I et al., Nat. Commun. 13: 6273, 2022). However, Hristovska et al. also reported lower microglia complexity in NREM vs wake state, which seems to be the opposite of the finding in this paper. The authors need to discuss the possible source of such differences.

      This is also an important point. Hristovska et al. reported the morphodynamic characteristics of microglia during wakefulness and NREM sleep. It is worth noting that the sleep state of the mice in their experiments was unnatural due to the head fixation and body limitations, the duration of NREM sleep (sleep stability) being quite different from the NREM sleep analyzed under natural sleep. The limitations of this approach are also discussed by Hristovska et al. “Even though sleep episodes were, as anticipated, shorter than those observed in freely moving animals, changes in neuronal activity characteristic of NREM sleep were monitored by EEG recordings, and changes in morphodynamics were observed during single episodes. Several episodes of REM sleep were detected, but they were too short and rare to be analyzed reliably.” The unnatural sleep state would lead to an increase in the microarousal state, and ultimately a change in the structure of the sleep state, which may be the main reason for the difference in microglia behavior from our natural sleep. We have discussed this in the revised manuscript. Please see line 292298.

      1. Fig. 3: the authors used single-plane images to analyze the morphological changes over 3 or 6 hours of SD, which raises the concern that the processes imaged at the baseline may drift out of focus, leading to the dramatic reduction in process lengths, surveillance area, and number of branch points. In fact, a previous study (Bellesi M et al., J. Neurosci. 37(21): 5263-73, 2017) shows that after 8 h SD, the number of microglia process endpoints per cell and the summed process length per cell do not change significantly (although there is a trend to decline). The authors may confirm their findings by either 3D imaging in vivo, or 3D imaging in fixed tissue.

      Three lines of evidence indicate that microglia morphology changes in Fig 3 are due to SD, rather than variations in the focal plane. First, our single-plane images were quite stable over 3 or 6 hours of SD, though occasional reversible drifts might happen due to sudden motions. Second, per your suggestion, further experiments and analysis of 3D imaging were performed to monitor microglia dynamics during sleep deprivation. The new result is shown in revised Fig. S3 C-D: the length of microglia branches and the number of branching points were significantly reduced after SD, in agreement with the results of single-plane imaging. Furthermore, we detected no significant difference in microglia branching characteristics during 6h sleep deprivation in 2AR KO mice (Fig.S4), and this indirectly affirmed that singleplane imaging is stable enough for detecting true changes in branching during SD.

      1. Fig. 4b: the EEG and EMG signals look significantly different from the example given in Fig. 2a. In particular, the EMG signal appears completely flat except for the first segment of wake state; the EEG power spectrum for REM appears dark; and the wake state corresponds to stronger low frequency components (below ~ 4 Hz) compared to NREM, which is the opposite of Fig. 2a. This raises the concern whether the classification of sleep stage is correct here.

      Thank you for insightful comments. We carefully examined the behavioral video of Figure 4b, there were occasionally microarousal events indicated by slow head rotation during NREM sleep, while the companion EMG signals were completely flat, which is atypical during sleep wake cycle. The microarousal events were not excluded from sleep, which makes this set of data unrepresentative and contrary to Fig.4b. In our revised manuscript, we replaced it with more representative data that can clearly and consistently distinguish between different brain states in mice on EMG and EEG. Please see revised Fig.2a, page 34; revised Fig.4b, page 37.

      1. Fig. 4 NE dynamics. • How long is a single continuous imaging session for NE? • When monitoring microglia surveillance, the authors were able to identify wake or NREM states longer than 15 min, and REM states longer than 5 min. Here the authors selected wake/NREM states longer than 1 min and REM states longer than 30 s. What makes such a big difference in the time duration selected for analysis? • Also, the definition of F0 is a bit unclear. Is the same F0 used throughout the entire imaging session, or is it defined with a moving window?

      A single continuous session of NE imaging usually took about 1 hour. Subsequent analysis was performed on imaging data from each recording that included wake, NREM sleep, and REM sleep. Because of the different time scales of microglia morphological dynamic (relatively slow) and NE signals (fast), we used different time windows in the previous analysis in the previous version of the manuscript.

      Per your suggestion, we have now set the same time window selection criteria for both microglia morphological and NE dynamic analysis: for wake and NREM sleep durations longer than 1 minute, and REM sleep durations longer than 30 seconds. We updated the Methods and all statistics in related figures, please see line 151-154, 481-485, 490-492; Fig. 2e-g and 2l-n, page 34. F0 definition is now explained in the Methods section. Please see line 521-522.

      1. Fig. 5b: how does the microglia morphology in LC axon ablation mice compare with wild type mice under the wake state? The text mentioned "more contracted" morphology but didn't give any quantification. Also, the morphology of microglia in the wake state (Fig. 5b) appears very different from that shown in Fig. S3C1 (baseline). What is the reason?

      The morphology of microglia is indeed heterogeneous and variable, affected by factors including brain state, brain region, microenvironmental changes, along with animal-to-animal difference. We didn’t perform the microglia morphology comparison between the LC axon ablation mice and wild type mice and, in view of this, we removed the description of “more contracted morphology” from the main text. It should also be noted that, as we primarily focused on changes of a microglia in different states over time by selfcomparison, we minimized possible effects of heterogeneity in microglia morphology on our conclusions.

      1. The relationship between NE level and microglia dynamics. Fig. 4C shows that the extracellular NE level is the highest in the wake state and the lowest in REM. Previous studies (Liu YU et al., Nat. Neurosci. 22(11):1771-1781, 2019; Stowell RD et al., Nat. Neurosci. 22(11): 1782-1792, 2019) suggest that high NE tone corresponds to reduced microglia complexity and surveillance. Hence, it would be expected that microglia process length, branch point number, and area/volume are higher in REM than in NREM. However, Fig. 2l-n show the opposite. How should we understand this ?

      Your point is well-taken. On the one hand, our data clearly showed that NE is critically involved in the brain state-dependent microglia dynamic surveillance, with evidence from the ablation of the LC-NE projection and from the β2AR knockout animal model.

      On the other hand, we also understand that NE is not the sole determinant, so the relationship between the NE level and the complexity and surveillance may not be unique.

      In this regard, other potential modulators also present dynamic during sleepwake cycle and may partake in the regulation of microglia dynamic surveillance. previous studies (Liu YU et al., 2019; Stowell RD et al., 2019) have shown that microglia can be jointly affected by surrounding neuronal activity and NE level during wake. It has been reported that LC firing stops (Aston-Jones et al., 1981; Rasmussen et al., 1986), while inhibitory neurons, such as PV neurons and VIP neurons, become relatively active during REM sleep (Brécier et al., 2022). ATP level in basal forebrain is shown to be higher in REM than NREM (Peng et al., 2023). In addition, our own preliminary result (Author response image 1) also showed a higher adenosine level in REM than NREM in somatosensory cortex. Last but not the least, we found that β2AR knockout failed to abolish microglial responses to sleep state switch and SD stress altogether.

      In brief, microglia are highly sensitive to varied changes in the surrounding environment, and many a modulator may participate in the microglia dynamic during sleep state. This may underlie the microglia complexity difference between REM and NREM. Future investigations are warranted to delineate the signal-integrative role of microglia in physiology and under stress. We have discussed the pertinent points in the revised manuscript. Please see line 343-354.

      Author response image 1.

      Extracellular adenosine levels in somatosensory cortex in different brain states. AAV2/9-hSyn-GRABAdo1.0 (Peng W. et al., Science. 2020) was injected into the somatosensory cortex (A/P, -1 mm; M/L, +2 mm; D/V, -0.3 mm). Data from the same recording are connected by lines. n = 9 from 3 mice.

      Reviewer #2 (Public Review):

      The manuscript describes an approach to monitor microglial structural dynamics and correlate it to ongoing changes in brain state during sleep-wake cycles. The main novelty here is the use of miniaturized 2p microscopy, which allows tracking microglia surveillance over long periods of hours, while the mice are allowed to freely behave. Accordingly, this experimental setup would permit to explore long-lasting changes in microglia in a more naturalistic environment, which were previously not possible to identify otherwise. The findings could provide key advances to the research of microglia during natural sleep and wakefulness, as opposed to anesthesia. The main findings of the paper are that microglia increase their process motility and surveillance during REM and NREM sleep as compared to the awake state. The authors further show that sleep deprivation induces opposite changes in microglia dynamics- limiting their surveillance and size. The authors then demonstrate potential causal role for norepinephrine secretion from the locus coeruleus (LC) which is driven by beta 2 adrenergic receptors (b2AR) on microglia. However, there are several methodological and experimental concerns which should be addressed.

      The major comments are summarized below:

      1. The main technological advantage of the 2p miniaturized microscope is the ability to track single cells over sleep cycles. A main question that is unclear from the analysis and the way the data is presented is: are the structural changes in microglia reversible? Meaning, could the authors provide evidence that the same cell can dynamically change in sleep state and then return to similar size in wakefulness? The same question arises again with the data which is presented for anesthesia, is this change reversible?

      As revealed by long-term free behavioral mTPM imaging, the brain-statedependent morphological changes in microglia were reproducible and reversible. Author response image 2 shows that microglia displayed reversible dynamic changes during multiple rounds of sleep-wake transition. Author response image 3 shows that microglia dynamics induced by anesthesia also exhibited reversibility.

      Author response image 2.

      Long-term tracking of microglia process area in different brain states. Data analysis used 8 cells. Data total of 31 time points were selected from in vivo imaging data and were used to characterize the morphological changes of microglia over a continuous 7-hour period.

      Author response image 3.

      Reversible changes of microglial process length, area, number of branch points under anesthesia. Wake group: 30 minute-accommodation to new environment; Isoflurane group: 1.5% in air applied at a flow rate of 0.4 L/min for 30 minutes; Recovery group: 30 minutes after recovery from anesthesia. n = 9 cells from 3 mice for each group.

      1. The binary comparison between brain states is misleading, shouldn't the changes in structural dynamics compared to the baseline of the state onset? The authors method describes analysis of the last 5 minutes in each sleep/wake state. However, these transitions are directional- for instance, REM usually follows NREM, so the description of a decrease in length during REM sleep could be inaccurate.

      As you know, the time scale of microglia morphological dynamic is relatively slow, so we analyzed the microglia morphological dynamic of the last part (30s in the revised manuscript) of each state instead of the state onset, allowing time for stabilization of the microglia response to inter-state transition.

      Further, we compared microglia dynamic between two NREM groups transiting to different subsequent states: group1 (NREM to REM) vs group2 (NREM to Wake). This precaution was to exclude the directional effect of state transitions. Our results showed that there was no difference in microglial length, area, number of branching points between the two NREM groups (Author response image 4), indicating that the last 30s of each NREM was not affected by its following state and that it’s reasonable to perform binary comparison.

      Author response image 4.

      Microglial morphological length, area change, and number of branch points of the last 30s of NREM sleep followed by REM or Wake. n = 9 cells from 3 mice for each group.

      1. Sleep deprivation- again, it is unclear whether these structural changes are reversible. This point is straightforward to address using this methodology by measuring sleep following SD. In addition, the authors chose a method to induce sleep deprivation that is rather harsh. It is unclear if the effect shown is the result of stress or perhaps an excess of motor activity.

      We adopted the method of forced exercise as it has been commonly used for sleep deprivation (Pandi-Perumal et al., 2007; Nollet M et al., 2020), though it does have the potential limitation of excess of motor activity.

      In light of your comments and suggestion, we presented new data demonstrating that sleep duration of the mice, mostly NREM sleep, increased compensatively (ZT9-10) after the 6-hour sleep deprivation (ZT2-8) (revised Fig. S3B). This result shows that sleep deprivation indeed increase sleep pressure in the mice. As the sleep pressure was eased during recovery sleep, morphological changes of microglia were reversed over a timescale of several hours (revised Fig. S3 E-J).

      1. The authors perform measurements of norepinephrine with a recently developed GRAB sensor. These experiments are performed to causally link microglia surveillance during sleep to norepinephrine secretion. They perform 2p imaging and collect data points which are single neurons, and it is unclear why the normalization and analysis is performed for bulk fluorescence similar to data obtained with photometry.

      We did not perform single-neuron analysis for two reasons. First, our experimental conditions, e.g., the expression of the NE indicator and the control of imaging laser intensity, did not yield sufficient signal-to-noise to clearly discriminate individual neurons with two-photon imaging. Second, NE signal may play a modulatory role, and fluorescence changes appeared to be global, rather than local or cell-specific. Therefore, we analyzed fluorescence changes in different brain states over the whole field-of-view in Fig. 4, rather than at the subregional or single-cell level.

      1. The experiments involving b2AR KO mice are difficult to interpret and do not provide substantial mechanistic insight. Since b2AR are expressed throughout numerous cell types in the brain and in the periphery, it is entirely not clear whether the effects on microglia dynamics are direct. The conclusion and the statement regarding the expression of b2AR in microglia is not supported by the references the authors present, which simply demonstrate the existence and function of b2AR in microglia. In addition, these mice show significant changes in sleep pattern and increased REM sleep. This could account for reasons for the changes in microglia structure rather than the interpretation that these are direct effects.

      To summarize, the main conclusions of the paper require further support with analysis of existing data and experimental validation.

      Previous studies have revealed that norepinephrine (NE) has a modulating effect on microglial dynamics through β2AR pathway (Stowell RD et al., 2019; Liu YU et al., 2019). Stowell et al. and Liu et al. use in vivo two-photon imaging to demonstrate that microglia dynamics differ between awake and anesthetized mice and to highlight the roles of NE and β2AR in these states (Gyoneva S et al., 2013; Stowell RD et al., 2019; Liu YU et al., 2019). To evaluate the direct effect of β2AR on microglial dynamics, Stowell et al. administered the β2AR agonist clenbuterol to anesthetized mice and found that this decreased the motility, arbor complexity, and process coverage of microglia in the parenchyma (Stowell RD et al., 2019). Inhibition of β2AR by antagonist ICI-118,551 in awake mice recapitulated the effects of anesthesia by enhancing microglial arborization and surveillance (Stowell RD et al., 2019). In addition, it has been shown microglia expressed higher numbers of β2ARs than any other cells in the brain (Zhang et al., 2014).

      To this end, our current work provided new evidence to support the involvement of the LC-NE-β2AR axis in modulating microglia dynamics both during natural sleep-wake cycle and under SD stress. While we were aware the limitation of using pan-tissue β2AR knockout model that precluded us from pinpointing role of microglial β2AR, it is safe to state that β2-adrenergic receptor signaling plays a significant role in the sleep-state dependent microglia dynamic surveillance, based on the present and previous data.

      We have discussed this in the revised manuscript. Please see line 324-354. As you suggested, we added references to support the statement regarding the expression of β2AR in microglia (please see line 333).

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      Some technical details need to be clarified. Also, please double-check for typos.

      1. In vivo imaging preparation: how long is the recovery time between window/EEG implantation surgery and imaging/recording?

      Imaging data were collected one month after the surgery. We have added descriptions to the methods section of the revised manuscript. Please see line 419.

      1. Statistical analysis: the authors used t-test or ANOVA without first checking whether the data pass the normality test. If the data does not follow a normal distribution, nonparametric tests would be more appropriate.

      Per your suggestion, we performed the test of statistical significance using parametric (ANOVA) if past the normality test, or the non-parametric (Friedman) tests for non-normal data. Please see line 533-535.

      1. Fig. 1b needs a minor change. In the figure, the EMG electrodes appear to be connected to the brain as well.

      We have corrected this oversight. Thank you.

      1. Fig. 1c: it would be helpful to give examples of raw EEG and EMG traces for REM and NREM separately.

      Raw traces are now shown as suggested. Please see Fig. 1c, page 32.

      1. Fig. 1h: is each data point one microglia or one end-point?

      In Fig. 1h, each data represents the average speed of all branches of one microglia, not one end-point.

      1. Sleep deprivation starts at 9 am. What time corresponds to Zeitgeber Time 0 (ZT0, the beginning of the light phase)?

      We now clarified that 9 am corresponds to Zeitgeber time 2. Please see line 196.

      1. Line 61: the authors referred to Ramon y Cajal's original suggestion that microglia dynamics are coupled to the sleep-wake cycle. However, the cited paper only indicates that Cajal suggested a role of astrocytes in the sleep-wake cycle, not microglia. In addition, there is a typo in the line: there should be a space between "Ramon" and "y" in Cajal's name.

      We have updated the statement and reference literature to point out the microglia’s involvement in the sleep-wake cycle. The typo was corrected. Please see line 64-65.

      1. Fig. S3B: As each group has only 3 mice, it is unclear how t-test can yield p < 0.01 or even 0.001.

      We checked the original data again and it was correct. This small p-values may be due to the small intra-group difference of control group.

      1. Line 251-253, "Figure 4h-n" should be "Figure 5h-n"?

      We have revised it. Please see line 265-266.

      1. Fig. 5h: the receptor should be "adrenergic receptor", not "adrenal receptor".

      We changed the term to “adrenergic receptor”. Please see Fig 5h.

      1. Fig. 5g, n: the number of data points is apparently less than the sample size given in the figure legend. Perhaps some data points have exactly the same value so they overlap? The authors may consider plotting identical values with a slight shift so that the number of data points shown matches the actual sample size, to avoid confusion.

      Yes, we have added small jitters so different data points can be seen to avoid confusion. Please see Fig. 5n.

      1. There are some typos (e.g., Line 217, "he" should be "the") and some incomplete references (e.g., [13], [22], [34], [35] lack volume and page number, [15] and [39] lack publisher information). Some references have inconsistent formats (e.g., "Journal of Neuroscience" is sometimes abbreviated and sometimes not). Please correct these.

      We have corrected these oversights. Please see references, page 27.

      Reviewer #2 (Recommendations For The Authors):

      Major issues:

      1. Re-analyze the data in a manner that allows to follow and compare the same cells over different state transitions. This is necessary to evaluate the reversibility of microglia structure. In addition, consider analysis of the change from the beginning to the end of each state.

      As shown in response figure 2, microglia dynamics were reversible during multiple rounds of sleep-wake transition.

      1. It would be nice to see the raw data obtained over time, at least for Figure 1, before offline correction of movement to evaluate the imaging quality and level of drift during imaging.

      We agree to your good suggestion. Please see the supporting material video.

      1. It would be helpful to add an analysis of the percent time spent in each state for the 10 hour recordings.

      Advice has been adopted. Please see revised Fig. S4C.

      1. In Figure 2 the results are from 15 cells from several animals. How much do the results vary between mice? It will be helpful to show if this varies between different mice by labeling cells from each mouse differently.

      In Author response image 5, in which we have labeled the distribution of data points from seven mice, there was mixed distribution of data from different animals at each brain state, but no clear animal-to-animal difference.

      Author response image 5.

      Quantitative analysis of microglial length based on multi-plane microglial imaging. n = 17 cells from 7 mice for each group. In right panel, each color codes data from the same animal.

      1. SD- please add some quantification for sleep and EEG to show that the manipulation really caused sleep deprivation. To address the confound of forced movement and stress, it might be helpful to add quantification of movement compared to an undisturbed wakefulness.

      We have added related data (revised Fig. S3B), as suggested. Please see line 196-197.

      1. The DSP4 application should be also performed with NE measurements to verify the specific of the NE signal measured as well as the DSP4 toxin.

      Following your suggestion, we have added DSP4 data in revised Fig. S4B.

      1. Some suggested refined experiments for the b2AR KO are: a-A conditional b2AR KO in microglia, as cited in the work. b- Local application of a b2 blocker during SD. c- Imaging of NE dynamics in the b2 animals. If NE dynamics during natural sleep cycle are perturbed, then this suggests upstream mechanisms rather than direct microglia effects as suggested by the authors.

      We agree that the current study cannot pinpoint a direct effect of microglia harbored β2AR. We have discussed this limitation in the revised manuscript.

      Please see line 324-354.

      Minor:

      1. Typo on page 4 (microcopy instead of microscopy).

      It was corrected. Please see line 87.

      1. Typo page 11- 'and he largest changes in NE' - supposed to be 'the'.

      We have corrected these mistakes. Please see line 228.

      1. Fig. 4- there are several units missing in the figure in panel b: the top is Hz, but what does the color bar indicate exactly? 2 what? both for theta/delta and for NE. We have modified this figure and legend for clarity. Please see Fig. 4, page 37.

      2. Bottom of page 12- referring to figure 4 but talking about figure 5.

      The typo was corrected. Please see line 265-266.

      Reference

      1. Aston-Jones G, Bloom FE. Activity of norepinephrine-containing locus coeruleus neurons in behaving rats anticipates fluctuations in the sleep-waking cycle. J Neurosci. 1, 876–886 (1981).

      2. Bellesi M, de Vivo L, Chini M, Gilli F, Tononi G, Cirelli C. Sleep loss promotes astrocytic phagocytosis and microglial activation in mouse cerebral cortex. J Neurosci. 37, 5263–5273 (2017).

      3. Brécier A, Borel M, Urbain N, Gentet LJ. Vigilance and behavioral state-dependent modulation of cortical neuronal activity throughout the sleep/wake cycle. J Neurosci. 42, 4852–66 (2022).

      4. Dworak M, McCarley RW, Kim T, Kalinchuk AV, Basheer R. Sleep and brain energy levels: ATP changes during sleep. J Neurosci. 30, 9007-16 (2010).

      5. Gyoneva S., Traynelis SF. Norepinephrine modulates the motility of resting and activated microglia via different adrenergic receptors. J Biol Chem. 288, 15291302 (2013).

      6. Kjaerby C, Andersen M, Hauglund N, Untiet V, Dall C, Sigurdsson B, Ding F, Feng J, Li Y, Weikop P, Hirase H, Nedergaard M. Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci. 25, 1059–1070 (2022).

      7. Liu T, Lu J, Lukasiewicz K, Pan B, Zuo Y. Stress induces microglia-associated synaptic circuit alterations in the dorsomedial prefrontal cortex. Neurobiology of Stress. 15, 100342 (2021).

      8. Liu YU, Ying Y, Li Y, Eyo UB, Chen T, Zheng J, Umpierre AD, Zhu J, Bosco DB, Dong H, Wu LJ. Neuronal network activity controls microglial process surveillance in awake mice via norepinephrine signaling. Nat Neurosci. 22, 1771–1781 (2019).

      9. Nollet M, Wisden W, Franks NP. Sleep deprivation and stress: a reciprocal relationship. Interface Focus. 10, 20190092 (2020).

      10. Pandi-Perumal SR, Cardinali DP, Chrousos GP. 2007. Neuroimmunology of sleep. New York, NY: Springer.

      11. Peng W, Liu X, Ma G, Wu Z, Wang Z, Fei X, Qin M, Wang L, Li Y, Zhang S, Xu M. Adenosine-independent regulation of the sleep-wake cycle by astrocyte activity. Cell Discov. 9, 16 (2023).

      12. Peng W, Wu Z, Song K, Zhang S, Li Y, Xu M. Regulation of sleep homeostasis mediator adenosine by basal forebrain glutamatergic neurons. Science. 369, 6508 (2020).

      13. Rasmussen K, Morilak DA, Jacobs BL. Single unit activity of locus coeruleus neurons in the freely moving cat: I. During naturalistic behaviors and in response to simple and complex stimuli. Brain Research. 371, 324–334 (1986).

      14. Stowell RD, Sipe GO, Dawes RP, Batchelor HN, Lordy KA, Whitelaw BS, Stoessel MB, Bidlack JM, Brown E, Sur M, Majewska AK. Noradrenergic signaling in the wakeful state inhibits microglial surveillance and synaptic plasticity in the mouse visual cortex. Nat Neurosci. 22, 1782-1792 (2019).

      15. Umpierre AD, Bystrom LL, Ying Y, Liu YU, Worrell G, Wu LJ. Microglial calcium signaling is attuned to neuronal activity in awake mice. Elife. 27, e56502 (2020).

      16. Wang Z, Fei X, Liu X, Wang Y, Hu Y, Peng W, Wang YW, Zhang S, Xu M. REM sleep is associated with distinct global cortical dynamics and controlled by occipital cortex. Nat Commun. 13, 6896 (2022).

      17. Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, Phatnani HP, Guarnieri P, Caneda C, Ruderisch N, Deng S, Liddelow SA, Zhang C, Daneman R, Maniatis T, Barres BA, Wu JQ. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci. 34, 11929–11947 (2014).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The Hedgehog (HH) protein family is important for embryonic development and adult tissue maintenance. Deregulation or even temporal imbalances in the activity of one of the main players in the HH field, sonic hedgehog (SHH), can lead to a variety of human diseases, ranging from congenital brain disorders to diverse forms of cancers. SHH activates the GLI family of transcription factors, yet the mechanisms underlying GLI activation remain poorly understood. Modification and activation of one of the main SHH signalling mediators, GLI2, depends on its localization to the tip of the primary cilium. In a previous study the lab had provided evidence that SHH activates GLI2 by stimulating its phosphorylation on conserved sites through Unc-51-like kinase 3 (ULK3) and another ULK family member, STK36 (Han et al., 2019). Recently, another ULK family member, ULK4, was identified as a modulator of the SHH pathway (Mecklenburg et al. 2021). However, the underlying mechanisms by which ULK4 enhances SHH signalling remained unknown. To address this question, the authors employed complex biochemistry-based approaches and localization studies in cell culture to examine the mode of ULK4 activity in the primary cilium in response to SHH. The study by Zhou et al. demonstrates that ULK4, in conjunction with STK36, promotes GLI2 phosphorylation and thereby SHH pathway activation. Further experiments were conducted to investigate how ULK4 interacts with SHH pathway components in the primary cilium. The authors show that ULK4 interacts with a complex formed between STK36 and GLI2 and hypothesize that ULK4 functions as a scaffold to facilitate STK36 and GLI2 interaction and thereby GLI2 phosphorylation by STK36. Furthermore, the authors provide evidence that ULK4 and STK36 co-localize with GLI2 at the ciliary tip of NIH 3T3 cells, and that ULK4 and STK36 depend on each other for their ciliary tip accumulation. Overall, the described ULK4-mediated mechanism of SHH pathway modulation is based on detailed and rigorous Co-IP experiments and kinase assays as well as confocal imaging localization studies. The authors used various mutated and wild-type constructs of STK36 and ULK4 to decipher the mechanisms underlying GLI2 phosphorylation at the tip of the primary cilium. These novel results on SHH pathway activation add valuable insight into the complexity of SHH pathway regulation. The data also provide possible new strategies for interfering with SHH signalling which has implications in drug development (e.g., cancer drugs).

      However, it will be necessary to explore additional model systems, besides NIH3T3, HEK293 and MEF cell cultures, to conclude on the universality of the mechanisms described in this study. Ultimately, it needs to be addressed whether ULK4 modulates SHH pathway activity in vivo. Is there evidence that genetic ablation of ULK4 in animal models leads to less efficient SHH pathway induction? It also remains to be resolved how ULK3 and ULK4 act in distinct or common manners to promote SHH signalling. Another remaining question is, whether cell type- and tissue-specific features exist, that play a role in ULK3- versus ULK4-dependent SHH pathway modulation. In particular for the studies on ciliary tip localization of factors, relevant for SHH pathway transduction, a higher temporal resolution will be needed in the future as well as a deeper insight into tissue/ cell type-specific mechanisms. These caveats, mentioned here, don't have to be addressed in new experiments for the revision of this manuscript but could be discussed.

      We agree with the reviewer that it would be important to investigate in the future the in vivo function Ulk4 in Shh signaling, the relationship between Ulk3 and Ulk4/Stk36, and possible cell type/tissue specificity of these two kinase systems. This will need the generation of single and double knockout mice and examine Hh related phenotypes in different tissues and developmental stages. The precise mechanism by which Ulk4 and Stk36 are translocated to the ciliary tip is also an important and unsolved issue. We include several paragraphs in the “discussion” section to address these outstanding questions for future study.

      Reviewer #2 (Public Review):

      The authors provide solid molecular and cellular evidence that ULK4 and STK36 not only interact, but that STK36 is targeted (transported?) to the cilium by ULK4. Their data helps generate a model for ULK4 acting as a scaffold for both STK36 and its substrate, Gli2, which appear to co-localise through mutual binding to ULK4. This makes sense, given the proposed role of most pseuodkinases as non-catalytic signaling hubs. There is also an important mechanistic analysis performed, in which ULK4 phosphorylation in an acidic consensus by STK36 is demonstrated using IP'd STK36 or an inactive 'AA' mutant, which suggests this phosphorylation is direct.

      The major strength of the study is the well-executed combination of logical approaches taken, including expression of various deletion and mutation constructs and the careful (but not always quantified in immunoblot) effects of depleting and adding back various components in the context of both STK36 and ULK3, which broadens the potential impact of the work. The biochemical analysis of ULK4 phosphorylation appears to be solid, and the mutational study at a particular pair of phosphorylation sites upstream of an acidic residue (notably T2023) is further strong evidence of a functional interaction between ULK4/STK36. The possibility that ULK4 requires ATP binding for these mechanisms is not approached, though would provide significant insight: for example it would be useful to ask if Lys39 in ULK4 is involved in any of these processes, because this residue is likely important for shaping the ULK4 substrate-binding site as a consequence of ATP binding; this was originally shown in PMID 24107129 and discussed more recently in PMID: 33147475 in the context of the large amount of ULK4 proteomics data released.

      The reviewer raised an interesting question of whether ATP binding to the pseudokinase domain of Ulk4 might be required for its function, i.e., by regulating the interaction with its binding partner. In a recent study (Preuss et al. 2020;PMID: 33147475), the critical Lys39 for ATP binding was converted to Arg (KR mutation); however, unlike in most kinases the KR mutation affect ATP binding, the K39R mutation in the Ulk4 pseudokinase did not affect ATP binding although it slightly increased ADP binding (PMID: 33147475). Another mutation made by Preuss et al(PMID: 33147475), N239L, affected protein stability, making it impossible to determine whether this mutation affect ATP binding. Therefore, in the absence of clear approach to perturb ATP binding without affecting the overall structure of Ulk4, it would be challenging to address whether ATP binding regulates the ability of Ulk4 to bind its substrates. Nevertheless, we discuss the possibility that ATP binding might regulate Ulk4/Stk36 interaction and Shh signaling.

      The discussion is excellent, and raises numerous important future work in terms of potential transportation mechanisms of this complex. It also explains why the ULK4 pseudokinase domain is linked to an extended C-terminal region. Does AF2 predict any structural motifs in this region that might support binding to Gli2?

      The extended C-terminal domain of Ulk4 contains Arm/HEAT repeats (protein-protein interacting domain), which are predicted by AF2 to form alpha helixes.

      A weakness in the study, which is most evident in Figure 1, where Ulk4 siRNA is performed in the NIH3T3 model (and effects on Shh targets and Gli2 phosphorylation assessed), is that we do not know if ULK4 protein is originally present in these cells in order to actually be depleted. Also, we are not informed if the ULK4 siRNA has an effect on the 'rescue' by HA-ULK4; perhaps the HA-ULK4 plasmid is RNAi resistant, or if not, this explains why phosphorylation of Gli2 never reaches zero? Given the important findings of this study, it would be useful for the authors to comment on this, and perhaps discuss if they have tried to evaluate endogenous levels of ULK4 (and Stk36) in these cells using antibody-based approaches, ideally in the presence and absence of Shh. The authors note early on the large number of binding partners identified for ULK4, and siRNA may unwittingly deplete some other proteins that could also be involved in ULK4 transport/stability in their cellular model.

      Due to the lack of reliable Ulk4 and Stk36 antibodies, we were unable to confirm knockdown efficiency by western blot analysis. Therefore, we relied on the measure Ulk4 and STk36 mRNA expression by RT-qPCR to estimate the knockdown efficiency (Fig 1- figure supplement 1). We used mouse Ulk4 shRNA to carry out the knockdown experiments in NIH3T3 and MEF cells while the human version of Ulk4 (hUlk4) was used for the rescue experiments (Fig 1- figure supplement 2; Fig. 8). We have confirmed that the mUlk4 shRNA targeting sequence is not conserved in hUlk4; therefore, the hULK4 construct is RNAi resistant. The rescue experiments strongly argue that the effect of Ulk4 RNAi on Shh signaling is due to loss of endogenous Ulk4. This argument is further strengthened by the observations that mutations that affected Ulk4 and Stk36 ciliary tip localization also affected Shh signaling such as Gli2 phosphorylation and Ptch1/Gli expression (Fig. 8).

      The sequence of ULK4 siRNAs is not included in the materials and methods as far as I can see.

      We have added the mouse Ulk4 RNAi target sequence in the revised version.

      Reviewer #3 (Public Review):

      In this manuscript, Zhou et al. demonstrate that the pseudokinase ULK4 has an important role in Hedgehog signaling by scaffolding the active kinase Stk36 and the transcription factor Gli2, enabling Gli2 to be phosphorylated and activated.

      Through nice biochemistry experiments, they show convincingly that the N-terminal pseudokinase domain of ULK4 binds Stk36 and the C-terminal Heat repeats bind Gli2.

      Lastly, they show that upon Sonic Hedgehog signaling, ULK4 localizes to the cilia and is needed to localize Stk36 and Gli2 for proper activation.

      This manuscript is very solid and methodically shows the role of ULK4 and STK36 throughout the whole paper, with well controlled experiments. The phosphomimetic and incapable mutations are very convincing as well. I think this manuscript is strong and stands as is, and there is no need for additional experiments.

      Overall, the strengths are the rigor of the methods, and the convincing case they bring for the formation of the ULK4-Gli2-Stk36 complex. There are no weaknesses noted. I think a little additional context for what is being observed in the immunofluorescence might benefit readers who are not familiar with these cell types and structures.

      We thank this reviewer for the positive comments.

      Recommendations For the Authors

      Reviewer #1 (Recommendations For The Authors):

      This elegant study has been thoroughly and thoughtfully designed and the dataset is solid. The biochemistry results are overall very convincing. Some data lack quantification and there needs to be more information on data analyses and statistics. The following suggestions and comments aim at strengthening the manuscript.

      1. Please provide quantification normalized to input for IP experiments (Figures 1 E - F; Figure 8 C). More information on data analyses and statistics should be provided and included as information in the figure legends.

      Thanks for the suggestions, we have done the quantification and statistics analyses for Figures 1E-G and Figure8 C as requested.

      1. Did the authors investigate whether overexpressing hULK4 in the control NIH3T3 cells leads to an increase in pS230/232 (related to Figure 1E)? This would nicely support the notion of a promoting effect of ULK4 on GLI2 phosphorylation.

      We did not. We speculated that overexpressing hULK4 may not significantly promote GLI2 phosphorylation because Ulk4 is a pseudokinase and endogenous Stk36 (the kinase partner of Ulk4) is limited.

      1. The CO-IP experiments to show GLI2 activation were performed in NIH3T3 cells, whereas HEK293 cells were used for the experiments shown in Figure 2. Is there a specific reason for switching between cell lines also for experiments shown in Figures 3 C- I? Did the authors repeat some of the key experiments in both cell lines?

      In mammalian cells, Shh-induced activation of GLI2 depends on primary cilia (Han et al., 2019). NIH3T3 cells form the primary cilia but HEK293T cells do not. Therefore, we used NIH3T3 cells to examine the processes that are regulated by the Shh treatment assay (e.g., the Shh-induced phosphorylation of GLI2 and STK36). The HEK293 cells were used to map binding domain between ULK4 and STK36/GLI2/SUFU due to the high transfection efficiency.

      1. In Figure 2 D-E the authors nicely showed that hUlk4N-HA interacted with CFP-Stk36 but not with Myc-Gli2/Fg-Sufu whereas hUlk4C-HA formed a complex with Myc-Gli2/Fg-Sufu but not with CFP-Stk36. In Figure 4E the authors showed in their Co-IP experiments that Fg-Stk36 and Myc-Gli2 form a complex independent of SHH treatment. Did the authors see some pull down of Stk36, still in complex with Gli2, using hUlk4C IP and pull down of Gli2, still in complex with Stk36, using hUlk4N IP?

      We did not test that. As we have shown in Figures 4A and 4E, knockdown of endogenous ULK4 nearly abolished the interaction between Myc-GLi2 and Fg-Stk36, suggesting that Ulk4 is the major scaffold to bring Skt36 and Gli2 together, and that there is little if any direct interaction between GLi2 and Stk36.

      1. Another method to verify hULK4-Stk36-Gli2 complex formation (Figure 4) would be helpful. For example, proximity ligation assays, tripartite split GFP assays, or colocalization based on expansion STED immunofluorescence microscopy could be performed to temporally and spatially resolve localization of Ulk4, Stk36 and Gli2 upon SHH stimulation in the primary cilium

      Thanks for the suggestions. We think that our current study using biochemical and cell biology approaches have provide sufficient evidence that Ulk4, Stk36 and Gli2 form complexes. We will keep in mind of those more sophisticated methods in our future endeavors.

      1. Please provide more representative images of Ulk4, Stk36 and Gli2 localization in NIH3T3 cells or lower magnification overview images showing more than one cell (Figure 5).

      We have provided more representative images in Figure 5- figure supplement 1A-F of the revised manuscript.

      1. Confirmation of the results shown in Figure 5 in a second cell line would strengthen the data.

      We have confirmed the results in MEFs (see Figure 5- figure supplement 1G-J)

      1. Did the authors add immunofluorescence for tubulin as a ciliary base marker to ensure correct assignment of ciliary tip versus ciliary base localization for quantification experiments (Figures 5 - 8)?

      It has been well documented that GLi2 is accumulated at the ciliary tip in respond to Shh treatment; therefore, we used Gli2 as a marker for ciliary tip where both Ulk4 and Stk36 were also accumulated. γ tubulin staining could be another marker to assign the ciliary tip vs base; however, the antibody combination we have did not allow us to simultaneously stain γ tubulin and acetylated tubulin (Ac-Tub).

      1. SMO localization as a further readout of SHH pathway activation might be considered to be added for some of the key results (e.g., Figure 6). Is SMO trafficking affected after depletion or overexpression of ULK4?

      Due to the lack of a workable antibody to detect endogenous Smo in our hands, we did not determine whether the trafficking of SMO is affected after depletion or overexpression of ULK4. However, we noticed that a recent study reported that the SHH-induced ciliary SMO accumulation was impaired in Ulk4 siRNA treated cells (Mecklenburg et al. 2021). We include this information and its implication in the discussion section

      1. Do the authors see ULK4 only at the ciliary tip after SHH stimulation or is there also a dynamic time-dependent localization along the ciliary shaft? The image in Figure 6E (dKO + Stk36 WT) seems to show ULK4 also in the shaft.

      Unlike Smo that is evenly distributed alone the axoneme of primary cilia, ULK4 is mainly accumulated at ciliary tips upon Shh stimulation. Ulk4 is also located at low levels outside the cilia and sometimes in the ciliary shaft during its transit to the ciliary tip (e.g., see Figure 5- figure supplement 1F1-2; J1-2).

      1. Is the immunofluorescence signal for Ulk4 significantly reduced after shRNA treatment to deplete Ulk4 (Figure 6A)?

      We constructed a cell line that stably expressed ULK4 shRNA. The knockdown efficiency was determined by measuring Ulk4 mRNA expression (Fig 1_figure supplement 1). Because we were unable to obtain a reliable ULK4 antibody for immunostaining, we did not examine by whether ULK4 signal was depleted by Ulk4 shRNA.

      1. The labelled ciliary tip resembles in some cases images seen for ciliary abscission. The authors could use membrane/ciliary membrane markers to ensure "intraciliary" localization of the investigated factors.

      Thanks for the suggestion. We will try that in our future experiments.

      1. How many replicates were used in the three independent quantitative RT-PCR experiments (Figure 1 A-D)?

      We used 3 replicates in each independent quantitative RT-PCR assay.

      1. Please provide p values or statement on no significance for the comparison between Ulk3 single and Ulk3/Ulk4 double knockdown (Figure 1C) and between Stk36 single and Stk36/Ulk4 double knockdown (Figure 1D; Fig1_Figure Supplement 2).

      Thanks for the suggestion, we have added the p value or “ns” as asked.

      1. Figure legends in general are a bit short could have some more detailed information.

      Thank you for the suggestion, we have revised the Figure legends as asked.

      1. What do the asterisks present in Figure 4 C-D?

      Thanks for the suggestion. The asterisks in Figure 4C-D indicated the full length STK36 and truncated form STK36N and STK36C fragments. We that included this information in the figure legend.

      1. The authors state that a previous study described ULK4 as a genetic modifier for holoprosencephaly and that this raised the possibility that ULK4 may participate in HH signal transduction. Primary ciliary localization of ULK4 in mouse neuronal tissue and SHH pathway modulation by ULK4 in cell culture have been shown by Mecklenburg et al. 2021 before. Maybe the authors could rephrase their introduction and discussion accordingly.

      Thanks for the suggestion, we have changed the introduction and discussion accordingly.

      1. Overexpression studies in heterologous systems using tagged proteins can potentially have an influence on their subcellular localization and function. Please discuss this caveat.

      We have mentioned this caveat in the “discussion” section of the revised manuscript. However, we have tried to express the transgene at low levels using the lentiviral vector containing a weak promoter to ensure that the exogenously expressed proteins are still regulated by Hh signaling. We have also confirmed that the tagged Ulk4 and Stk36 can rescue the loss of endogenous genes.

      1. More details in the Methods section should be provided on the SHH induction in NIH3T3 cells, HEK293 cells and MEFs.

      We have revised the methods section on Shh induction.

      1. ULK4 is known to have at least three isoforms that exhibit varying abundance across developmental stages in mice and humans (Lang et al., 2014) (DOI:10.1242/jcs.137604). Can the authors speculate on potential common and distinct functions of the different ULK4 isoforms on SHH pathway modulation based on their present results?

      It is interesting that Ulk4 has multiple isoforms in both mouse and human. Several short isoforms in both mouse and human lack the pseudokinase domain while one short isoform in mouse lacks the C-terminal region essential for Ulk4 ciliary tip localization. We speculate that the C-terminally deleted isoform may not have a function in the Shh pathway based on our results shown in Fig. 7 and 8 but might still have functions in other cellular processes.

      Reviewer #2 (Recommendations For The Authors):

      The paper is well written, and clear throughout, with excellent (up-to-date) citations to the field.

      We thank reviewer #2 for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      My only quibble is that the immunofluorescence images are a little confusing, especially to people outside of the field. Please include an image of the whole field and improve the captions. Is that a single cell for each cilia? Why are there so few cilia? The DAPI makes it seem like What are we looking at? Are those multiple nuclei in Figure 6? They seem a little small if that's the 5 uM scale bar

      We provide uncropped images of Figure 5E to show the entire cells (below). We have added some context to improve the captions. Most of the mammalian cells such as MEF and NIH3T3 cells contain a single primary cilium; however, mutilated cells do exist. The DAPI staining indicated the nuclei. The cells shown in Figure 6 have single nucleus (the scale should be 2 µM). Due to the unevenness of DAPI signals in the nuclei, only the strong signals (puncta) were shown for individual nuclei.

      Author response image 1.

      One small typo: GLL2 instead of GLI2 on line 363

      Thanks, we have corrected this spelling mistake.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely thank and express our appreciation to each of the reviewers for their thorough critique of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1. The analysis of whole study comes from only 4 cells from L2/3 of ferret visual cortex; however, it is well known that there is a high level of functional heterogeneity within the cortical neurons. Do those four neurons have similar or different response properties? If the four neurons are functionally different, the weak or no correlation may result from heterogenous distribution pattern of mitochondria associated with heterogenous functionality.

      This is an important consideration and often a limitation of CLEM studies. While cortical neurons do exhibit a high degree of functional heterogeneity (similar to spine activity), the 4 cells examined all had selective (OSI > 0.4) somatic responses to oriented gratings, although they differed in their exact orientation preference. Due to experimental limitations of recording from a large population of dendritic spines, we did not characterize other response properties for which their sensitivity might differ. We did not consider orientation preference a metric of study, but instead characterized the difference in preference from the somatic output, allowing comparisons across spines. In addition, our measurements were limited to proximal, basal dendrites rather than any location in the dendritic tree. Nonetheless, we attempted to address this concern by examining spines with functionally heterogenous visual responses within single cells, as reported in our manuscript: mitochondrial volume within a 1 µm radius was correlated with difference in orientation preference relative to the soma across all 4 cells, the mean r = 0.49 +/- 0.22 s.d.), suggesting that cell-to-cell variability has a minimal impact on our main conclusions.

      Even with our limited measurements, there is a large amount of functional heterogeneity in dendritic spine responses (Extended Data Figure 2, Scholl et al. 2021), far greater than the small differences in somatic responses of these 4 cells (Figure 3, Scholl et al. 2021). Moreover, the individual dendrites from these 4 cells exhibited substantial heterogeneity in the distribution of mitochondria. We cannot rule out whether heterogeneity at various scales may obscure certain relationships or result in the weak correlations we observed. We also note that future advancements in volume electron microscopy should allow for greater sample sizes that can better address the role of functional (and structural) heterogeneity. We have added text in the Discussion section about the potential structure-function relationships that might be obscured or revealed by neuron heterogeneity.

      1. The authors argued that "mitochondria are not necessarily positioned to support the energy needs of strong spines." However, the overall energy needs of a spine depend not only on the synaptic strength but also on the frequency of synaptic activity. Is there a correlation between the mitochondria volume around a spine and the overall activity of the spine? This data needs to be analyzed to confirm the distribution of mitochondria is independent of local energy needs.

      The reviewer is correct, but our experimental paradigm was not optimally designed to measure the ‘frequency’ of synaptic activity in vivo. This could have been accomplished with flashed gratings or, perhaps, presenting drifting gratings at different temporal frequencies. For spines tuned to higher temporal frequencies in V1, we may expect greater energy needs, although as the reviewer suggests, energy needs will depend on synapse (and bouton) size. Because we are not able to directly measure activity frequency as carefully or beautifully as can be done ex vivo or in nerve fibers, we do not feel confident in attempting such analysis in this study. Instead, based on previous studies, we assumed that larger synapses might be able to transmit higher frequencies, and thus have higher energy demands. We believe future in vivo studies will more directly measure synaptic frequency for comparison with mitochondria.

      We have added text in the Discussion about this caveat and potential future experiments.

      1. In the results section, the authors briefly mentioned that "We also considered other spine response properties related to tuning preference; specifically, orientation selectivity and response amplitude at the preferred orientation. For either metric, we observed no relationship to mitochondria within 1 μm radius (selectivity: 1 μm: r = -0.081, p = 0.269, n = 60; max response amplitude: 1 μm: r = -0.179, p = 0.078, n = 64) but did see a weak, significant relationship to both at a 5 μm radius (selectivity: r = 0.175, p = 0.027, n = 121; max response amplitude: r =-0.166, p = 0.030, n = 129)." Here only statistic results were given while the data were not presented in the figure illustration. Based on the methods and Figure 3B, it seems that the preferred orientations were calculated based on the vector summation. How did the authors calculate the "response amplitude at the preferred orientation"? This needs to be clarified. In addition, given the huge variety of orientation selectivity, using the response amplitude at the preferred orientation may not be the best parameter to correlate with the mitochondria volume which is indicative of energy needs. It might be necessary to include the baseline activity without visual stimulation and the average response for visual stimuli of different orientations in the analysis.

      We apologize for this oversight, as the details are present in our previous study (Scholl et al., 2021). Response amplitude and preferred orientation were calculated from a Gaussian curve fitting procedure with specific parameters describing those exact values (see Scholl et al. 2021 or Scholl et al. 2013). Only spines with selective responses (vector strength index > 0.1) and passing our SNR criterion were used for these analyses. We have now added this information to the Methods section and referred to it in the Results. With respect to the reviewer’s other concern, we also examined the average response amplitude (across all visual stimuli). There we found no relationship between the volume of mitochondria within 1 or 5 microns of a spine, however, because spines differ greatly in their selectivity (range = 0 – 0.8) the average response may not be an appropriate metric to compare across spines.

      1. A continuation from the former point, do the spines with similar preferred orientation to the somatic Ca signal have similar Ca signal strength, orientation selectivity index and other characteristics to the spines with different preferred orientation? As shown in the examples (Figure 3B), the spine on the right with different orientation preference compared with its soma has considerably larger response in non-preferred orientation compared to the spine on the left. Thus, the overall activity of the spine on the right may be higher than the spine which has similar preferred orientation to the soma. The authors showed that a positive correlation between difference in orientation preference and mitochondria volume (Figure 3C). Could this be simply due to higher spine activity for non-preferred orientation or spontaneous activity? Thus, the mitochondria might be positioned to support spines with higher overall activity rather than diverse response property.

      The reviewer brings up an interesting consideration. We examined the response properties of spines co-tuned (∆θpref < 22.5 deg) and differentially-tuned (∆θpref > 67.5 deg) to the soma. The orientation selectivity was not different between the two groups (p = 0.12, Wilcoxon ranksum test), although there was a small trend towards co-tuned inputs being more selective. We found that calcium response amplitudes for the preferred stimulus were also not different (p = 0.58, Wilcoxon ranksum test). These analyses are now included as a sentence in the Results.

      We agree with the reviewer that higher spontaneous activity in non-preferred spines may help explain the mitochondrial relationship we observe. However, our current dataset does not have sufficiently long recordings to measure spontaneous synaptic activity. Further, when considering a non-anesthetized preparation, spontaneous activity is highly dependent on brain state and an animal’s self-driven brain activity, which all must be experimentally controlled or measured to accurately address this.

      1. In addition, the information about the orientation selectivity of the soma is also missing. Do the four cells shown here all have similar level of orientation selectivity? Or some have relative weak orientation selectivity in the soma?

      Yes, all 4 cells have a similar OSI (range = 0.4 – 0.57, mean = 0.46 +/ 0.08 s.d.). This has been added to the Results section.

      1. This study focused on only a fraction of spines that are (1) responsive (2) osi > 0.1. However, in theory energy consumption is also related to non-responsive spines and spines with weak orientation tuning. What is the percentage of tuned and untuned spines? What's the correlation of mitochondria volume and spine activity level for untuned spines? I also recommend including the non-responsive spines into the analysis. For example, for each mitochondrion calculate the averaged overall activity of spines within certain distance from the mitochondrion, including the non-responsive spines. I would predict there may be more active spines and higher overall spine activity of dentritic segments near a mitochondrion than segments far from a mitochondrion.

      A majority of spines were tuned for orientation (~91%), although we specifically chose to only analyze data from spines with verifiable, independent calcium events. All analyses except those involving measurements of orientation preference use all dendritic spines (i.e. tuned and untuned). We have clarified this in the Results.

      These other ‘silent’ (i.e. without resolvable visual activity) spines may significantly contribute to energy demands of a dendrite too, as our methods (GCaMP6s expression) likely only capture synaptic events driving Ca+2 influx through NMDA receptors or VGCCs. We expect that glutamate imaging (e.g. iGlusnfr) may open the door to additional analyses to fully characterize functional relationship between spines and mitochondria.

      1. The correlation coefficient for mitochondria volume and difference in orientation preference is relatively low (r=0.3150). With such weak correlation, the explanatory power of this data is limited.

      We agree that while the correlation is significant, it is not particularly strong. To better represent the noise surrounding this measurement, we performed a bootstrap correlation analysis, sampling with replacement (1 micron: mean r = 0.31 +/- 0.11 s.e., 5 micron: mean r = 0.02 +/- 0.10 s.e.). We now include this in the Results.

      1. Why do the numbers of spines in different figures vary? For example, n=60 for 1micron in Figure 3, 54 in Figure 3c, 31 in Figure 4b, 51 in Figure 4e and so on.

      We apologize for the lack of clarity. Each analysis presented different requirements of the data. For example, orientation preference was computed only for selective (OSI > 1) spines (Fig. 3c), but this requirement did not apply to comparisons with selectivity or response amplitude (Fig. 3d). Similarly, as stated in the Results and Methods, measurements of local heterogeneity require a minimum number of neighboring spines (n > 2), limiting the number of usable spines for analysis (Fig. 4). We have clarified this in the text.

      1. In Figure 6a, the sample sizes of mito+ spines and mito- spines are extremely unbalanced, which affects the stat power of the analysis. I recommend performing a randomization test.

      We thank the reviewer for this suggestion. We ran permutation tests to compare the similarity in mean values between equally sampled values from each distribution. These tests supported our original analysis and conclusions. We have added these tests to the Results.

      1. Ca signals are approximations of electrical signals. How well are spinal calcium signals correlated to synaptic strength and local depolarization? This should be put into discussion.

      There is unlikely a simple, direct relationship between spine calcium signal and synaptic strength or membrane depolarization, and this has never been addressed in vivo. Koester and Johnston (2005) performed paired recordings in slice and showed that single presynaptic action potentials producing successful transmission generate widely different calcium amplitudes (Fig. 3). Another study from Sobczyk, Scheuss, and Svoboda (2005) used two-photon glutamate uncaging on single spines and showed that micro-EPSC’s evoked are uncorrelated with the spine calcium signal amplitude. We have added a note about this in the discussion.

      1. In Figure 4i, the negative correlation may depend on the 4 data points on the right side. How influential are those data points?

      Spearman’s correlation coefficient analysis is robust to outliers and it is highly unlikely these datapoints are critical with our sample size (n > 100 spines).

      1. Raw data of Ca responses were missing.

      Some data has been published with the parent publication (Scholl et al., 2021). As spine imaging data is difficult to obtain and highly unique, we prefer to provide raw data directly upon reasonable request of the corresponding author.

      1. What is the temporal frequency of the drifting grating? Was it fixed or the speed of the grating was fixed?

      This was fixed to 4 Hz and this is now included in the Methods.

      Reviewer #2 (Recommendations For The Authors):

      1. Most of the measurements were based on the distance from the base of the spine neck, and "only on spines with measurable mitochondrial volume at each radius" were analyzed. To better understand the causality, it may also be interesting to have an analysis based on the distance from mitochondria. Would the result be different if the measurements are not 1µm / 5µm from spine but 1µm / 5µm from mitochondria? (e.g. total spine volume in 1µm / 5µm from mitochondria).

      In fact, our first iteration of this study focused on exactly this metric: measuring the distance to nearest mitochondria. However, after lengthy discussions between the authors, we ultimately decided this metric was inferior to a volumetric one. Our decision was based on several factors: (1) distance to mitochondrion is ill defined (e.g. distance to the a mitochondrion center or nearest membrane edge?), (2) the total amount of mitochondrial volume within a dendritic shaft should allow the greatest amount of energetic support (e.g. more cristae for ATP production, greater capacity for calcium buffering), and (3) we would not account for the geometry of individual mitochondria or their placement near a spine (e.g. when 2 different mitochondria are next to the same spine) We have added further clarification of our reasoning to the Results.

      Nonetheless, we present the reviewer some of our original analyses correlating distance to mitochondria (from the base of the spine and including the spine neck length):

      Author response image 1.

      Here, we examined the relationship to spine head volume, spine-soma orientation preference difference, and the local orientation preference heterogeneity. No relationship showed any significant correlation. Again, this may not be surprising given the drawbacks of measuring ‘distance to mitochondria’.

      1. Is there a selection criterion for the spine for the analysis? Are filopodia spines excluded in the analysis?

      Spines were analyzed regardless of structural classification; however, they were only analyzed if they had a synaptic density with synaptic vesicle accumulation. In our dataset (including those visualized in vivo and reconstructed from the EM volume) we observed no filopodia.

      1. The result states that "56.8% of spines had no mitochondria volume within 1 μm and 12.1% of spines had none within 5 μm.". In other words, around 43% of spines had mitochondria within 1 μm. It would be interesting to show whether there is a correlation between mitochondria size and spine density.

      We agree that this is an interesting measurement. It has been reported that mitochondrial unit length along the dendrite co-varies with linear synapse density in the neocortical distal dendrites of mice (Turner et al., 2022). This was specifically true in distal portions of dendrites more than 60 µm from the soma, because mitochondria volume increases as a function of distance roughly up to this point, then remains relatively constant beyond this distance.

      To investigate this possibility, we calculated the local spine density around an individual spine and compared to the mitochondria volume within 1 or 5 µm. We found no evidence of a correlation between local spine density and the volume of mitochondria (1 µm: Spearman r = -0.07447, p = 0.2859; 5 µm: r = -0.04447, p = 0.3141). However, the majority of our measurements are more proximal than 60 µm (our median distance of all spines = 49.4 µm, max = 114 µm) and this may be one reason why observe no correlation.

      1. In Figure 3B, the drifting grating directions are examined from 0 to 315 degrees in the experiment. However, in Figure 3C and 3D, the spine-soma difference of orientation preference was limited to 0 to 90 degree in the graph. Is the graph trimmed, or is there a cause that limits the spine-soma difference of orientation preference to 90?

      Ferret visual cortical neurons are highly sensitive to grating direction and the responses are fit by a double Gaussian curve which estimates the ‘orientation preference’ (0-180 deg). We then calculated the absolute difference in orientation preference and wrapped that value in circular space so the maximum difference possible is 90 deg (e.g. 135 deg -> 45 deg).

      1. In Figure 4D-F, how is the temporal correlation of calcium activity determined? Is it based on stimulated activity or basal activity? A brief explanation may be helpful to the readers. Also, scale bars could be added to Fig 4D.

      Temporal correlation is computed as the signal correlation between 2 spines over the entire imaging session at that field of view. Specifically, we measured the Pearson correlation between each spine’s ∆F/F trace. To measure the local spatiotemporal correlation, we computed correlations between all neighboring spines within 5 microns and took the average of those values. We have clarified this in the Results section.

      1. Figure 3C and Figure 4D displayed a significant correlation in 1µm range and such correlation drastically diminished once the criterion changed to 5µm range. It would be interesting to include the criterion of intermediate ranges. It would be interesting to see if there is a trend or tendency or if there is a "cut-off" limit.

      We agree with the reviewer that the drastic change in the correlations between 1 and 5 µm range was surprising to see. While these volumetric measurements are time consuming, we returned to our data and measured an intermediate point of 3 µm. Investigating relationships reported in our study, we found no significant trends for spine-soma similarity (Spearman’s r = -0.011, p = 0.54) or local heterogeneity (Spearman’s r = 0.11, p = 0.23). This suggests that a potential ‘critical distance’ might be less than 3 µm; however, far more additional measurements and analyses would be needed to attempt to identify exactly what this distance is.

      1. In Figure 5, it is shown that spines having mitochondrion in the head or neck are larger. However, only 10 spines are found with mitochondria inside. In the current dataset, are mitochondria abundantly found in large spines? Further analysis or justification would be informative to address this.

      In our dataset, mitochondria were found in ~5% of all spines. Spines with mitochondria have a median volume of approximately 0.6 µm3, roughly twice as large as than those without mitochondria, as the reviewer suggests. In the entire population of spines without mitochondria, a volume of 0.6 µm3 represents roughly the 82nd percentile. In other words, of the total population of 157 spines without mitochondria, only 29 had equal or greater volume than the median spine with a mitochondrion. We believe this trend is clearly shown in Figure 5A and is supported by our analysis, including new permutation tests suggested by Reviewer 1.

      Reviewer #3 (Recommendations For The Authors):

      1. The authors state that their unsupervised method "quickly and accurately identified mitochondria," but the methods section only says that segmentations were proofread. Was every segmentation examined and judged to be accurate, or was only a subset of the 324 mitochondria checked?

      After deep learning-based extraction, each mitochondrion segmentation was manually proofread. For each dendrite segment, this was ~10-20 mitochondria, so it did not take long to manually inspect and edit each mitochondrion segmentation.

      1. The EM image of the mitochondrion in the spine head in Figure 2C is low resolution and does not apply to the bulk of the data. Images more representative of the analyzed data should be added to supplement the cartoons.

      Our primary rationale for including this specific image was to show that the mitochondria located within spines are small, round, and to include a view of the synapse as well as the mitochondrion. We now include enlarge and additional EM images to Figure 1C.

      1. The majority of spines did not have any mitochondria within a 1 micron radius and were excluded from the correlation analyses, so most of the conclusions are based on a minority of spines. It would be interesting to see comparisons between spines with and without nearby mitochondria. Correlations between the absolute distance to any mitochondrion, synapse size, and mismatch to soma orientation would be especially interesting.

      The reviewer brings up a good point. It is true that many spines were excluded from our analysis based on the fact that they did not have nearby mitochondria within 1 or 5 µm (56.8% of spines had no mitochondria volume within 1 μm and 12.1% of spines had none within 5 μm). We compared the distributions of synapse size, mismatch to soma, and orientation selectivity of two groups of spines – those with at least some mitochondria within 1 µm (n = 65) versus spines without any mitochondria within 5 µm (n = 19).

      We found no difference in the distributions between spine volume (1 µm: median = 0.29 µm3, IQR = 0.41 µm3; no mitochondria within 5 µm: median = 0.40 µm3, IQR = 0.37 µm3; p = 0.67) or PSD area (1 µm: median = 0.26 µm2, IQR = 0.33 µm2; no mitochondria within 5 µm: median = 0.31 µm2, IQR = 0.36 µm2; p = 0.49). For functional measures, we also saw no difference in orientation selectivity (1 µm: median = 0.29, IQR = 0.28; no mitochondria within 5 µm: median = 0.28, IQR = 0.15; p = 0.74) or mismatch to soma orientation (1 µm: median = 0.54 deg, IQR = 0.86 deg; no mitochondria within 5 µm: median = 0.46 deg, IQR = 0.47 deg; p = 0.75). We now include analyses in the Results.

      We also looked at the absolute distances to mitochondria and did not find any significant relationships to spine head volume, spine-soma orientation preference difference, or the local orientation preference heterogeneity (see our response to reviewer #2 for more information).

      1. In Figure 1A the mitochondria appear to be taking up a substantial fraction of the dendritic shaft diameter, even for distal dendrites. It would be useful to know the absolute diameter of the dendrites and mitochondria, given that this is not rodent data and there is no reference for either in the ferret.

      We agree with the reviewer’s point, although we would like to remind the reviewer that these are basal dendrites of layer 2/3 cells. Basal dendrites tend to be thinner than apical branches. Interestingly, in some cases, the dendrite even swells to accommodate a mitochondrion. We did not incorporate this measurement in our study because it is not trivial; dendrite diameter is variable and dendrites are not perfect cylinders. Although we did not make precise measurements across our dendrites, the diameter is comparable to what has been seen in mouse cortex (Turner et al., 2022), roughly 500-1000 nm, but as small as 100 nm at some pinch points. In terms of mitochondria, many were roughly spherical or oblong, therefore the maximum diameters we report are roughly similar to, if not a bit larger than, those of the cross-sectional diameter.

      1. As a rule, PSD area is correlated with spine volume, which makes the observation that spines with mitochondria have larger volume but not PSD area surprising. With n=10 it is difficult to draw conclusions, but it would be interesting to know the PSD area-to-volume ratio of other spines of the same volume and synapse size.

      We were also somewhat surprised to see this, but exactly as the reviewer mentioned, we believe it to be a limitation of the sample size. The difference in volume was large enough to be detected despite a small sample size. We saw a trend towards larger synapses when spines have mitochondria (the median was approximately 60% larger), and we would expect with a larger comparison that PSD area would be significantly greater in spines with mitochondria.

      We calculated the PSD area-to-spine head volume ratio for spines with or without mitochondria. Spines with mitochondria had a significantly lower ratio compared to those without (Mann-Whitney test, p = 0.0056, mito - = 0.78, n = 10; mito + = 0.53, n = 157). As the reviewer mentions, it is somewhat difficult to draw a conclusion from this, but it appears that the PSD does not scale with the increased spine head size.

      Author response image 2.

      The only way to definitively address this is to increase the sample size, which is becoming easier to achieve with the progression of volume EM imaging and analysis techniques in recent times. We look forward to addressing this in the future.

      1. Nothing is made of the significant fact that these data come from the visual system of a carnivore, not a mouse. Consideration of differences in visual physiology between rodents and carnivores would be worthwhile to put the function of these dendrites in context.

      We thank the reviewer for this consideration and have added text to the Discussion.

    1. Author Response

      Reviewer #2 (Public Review):

      Manassaro et al. present an extensive three-session study in which they aimed to change defensive responses (skin conductance; SCR) to an aversively conditioned stimulus by targeting medial prefrontal cortex (their words) using repetitive TMS prior to retrieval. They report that stimulating mPFC using TMS abolishes SCR responses to the conditioned stimulus, and that this effect is specific for the stimulated region and the specific CS-US association, given that SCR responses to a different modality US are not changed.

      I like how the authors have clearly attempted to control for several potential confounds by including multiple stimulation sites, measured SCR responses to several unconditioned stimuli, and applied the experiment in multiple contexts. However, several conceptual and practical issues remain that I think limit the value of potential conclusions drawn from this work.

      The first issue that I have with this study concerns the relationship between the TMS manipulation and the theoretical background the authors present in their rationale. In the introduction the authors sketch that what they call 'mPFC' is involved in regulation of threat responses. They make a convincing case, however, almost all of the evidence they present concerns the ventromedial part of the prefrontal cortex (refs 18-25). The authors then mention that no one has ever studied the effects of 'mPFC'-TMS on threat memories. That is not surprising given that stimulating vmPFC with TMS is very difficult, if not impossible. Simulation of the electrical field that develops as a consequence from the authors manipulation (using the same TMS coil and positioning the authors use) shows that vmPFC (or mPFC for that matter) is not stimulated. The authors then continue in the methods section stating that the region they aimed for was BA10. This region they presumably do stimulate, however, that does not follow logically from their argument. BA10 is anatomically, cytoarchitectonically and functionally a wholly different area than vmPFC and I wonder if their rationale would hold given that they stimulate BA10.

      We would like to thank the Reviewer for highlighting this very important point. The Reviewer is right in stating that the Brodmann area 10 (BA 10) is anatomically, cytoarchitectonically, and functionally distinct from the ventromedial PFC. As we reported in the Methods section, the coil placement over the frontopolar midline electrode (Fpz) according to the international 10‒20 EEG coordinate system directly focused the stimulation over the medial portion of the BA 10. In the literature, the aPFC is also known as the “frontopolar cortex” or the “rostral frontal cortex” and encompasses the most anterior portion of the prefrontal cortex, which corresponds to the BA 10. In line with this observation, we have corrected “medial prefrontal cortex” (mPFC) with “medial anterior prefrontal cortex” (aPFC) throughout the manuscript. We also have corrected the theoretical background and the rationale in the Introduction section by mentioning several studies that: i) Reported the involvement of the aPFC in emotional down-regulation (Volman et al., 2013; Koch et al., 2018; Bramson et al., 2020). ii) Traced anatomical connections between the medial/lateral aPFC and the amygdala (Peng et al., 2018; Folloni et al., 2019; Bramson et al., 2020). iii) Detected functional connections between the aPFC and the vmPFC during fear down-regulation (Klumpers et al., 2010). iv) Found hypoactivation, reduced connectivity, and altered thickness of aPFC in PTSD patients (Lanius et al., 2005; Morey et al., 2008; Sadeh et al., 2015; Sadeh et al., 2016). v) Revealed that strong activation of the aPFC may promote a higher resilience against PTSD onset (Kaldewaij et al., 2021) and that enhanced aPFC activity and potentiated aPFC-vmPFC connectivity is detectable after effective therapy in PTSD patients (Fonzo et al., 2017). Furthermore, we discussed our results in light of this evidence in the Discussion section. We really thank the Reviewer for this key implementation of our study.

      The second concern I have is that although I think the authors should be praised for including both sham and active control regions, the controls might not be optimally chosen to control for the potential confounds of their condition of interest (mPFC-TMS). Namely, TMS on the forehead can be unpleasant, if not painful, whereas sham-TMS or TMS applied to the back of the head or even over dlPFC is not (or less so at the very least). Given that the SCR results after mPFC TMS show exactly the same temporal pattern as the sham-TMS but with a lower starting point, one could wonder whether a painful stimulation prior to the retrieval might have already caused habituation to painful stimulation observed in SCR in consequent CS presentations. A control region that would have been more obvious to take is the lateral part of BA10, by moving the TMS coil several centimeters to the left or right, circumventing all things potentially called medial but giving similar unpleasant sensations (pain etc).

      We would also like to thank the Reviewer for bringing to light this issue and allowing us to strengthen our results. The Reviewer is right in pointing out that rTMS application over the forehead can be subjectively perceived as unpleasant, relative to other head coordinates or sham stimulation. The question of whether an unpleasant stimulation prior to the retrieval might provoke habituation to discomfort sensations and lead to weaker SCRs in the consequent CS presentations is valid and reasonable. We also thank the Reviewer for advising us to stimulate the lateral part of BA 10 as an active control site. However, given the potential involvement of the lateral BA 10 in the fear network (see previous point) and the potential risks due to the anatomical proximity of lateral BA 10 with the temporal lobe, we reasoned to adopt an alternative approach to investigate whether “a painful stimulation prior to the retrieval might have already caused habituation to painful stimulation observed in SCR in consequent CS presentations”. We repeated the entire experiment in one further group (ctrl discomfort, n = 10) by replacing the rTMS procedure with a 10-min discomfort-inducing procedure over the same site of the forehead (Fpz) to mimic the rTMS-evoked unpleasant sensations in the absence of neural stimulation effects (see the new version of the Methods section). The electrical stimulation intensity was individually calibrated through a staircase procedure (0 = no discomfort; 10 = high discomfort). The shock amplitude was set at the current level corresponding to the mean rating of ‘4’ on the subjective scale because, in the new experiments that we performed targeting the aPFC with rTMS (n = 9), we collected participants’ rTMS-induced discomfort ratings obtaining a mean rating of 3.833 ± 0.589 SEM on the same scale. We found CS-evoked SCR levels not significantly different to those of the sham group during the test session as well as during the follow-up session, suggesting that the discomfort experienced during the rTMS procedure did not contribute to the reduction of electrodermal responses observed in the aPFC group. We reported the results of this experiment in the Results section and Figure 2-figure supplement 2.

      My final concern is that the main analyses are performed on single trials of SCR responses, which is a relatively noise measure to use on single trials. This is also done in relatively small groups (n=21). I would have liked to see both the raw or at least averaged timeseries SCR data plotted, and a rationale explaining how the authors decided on the current sample sizes, if that was based on a power analyses one must have expected quite strong effects.

      Following the Reviewer’s suggestion, we decided to remove the analysis on single trials, and we apologize for not including SCR timeseries. To quantify the amount of effect induced by the rTMS protocol, we have now added within-group comparisons (through 2 × 2 mixed ANOVAs) that show, for each group, the amount of change in CS-evoked SCRs from the conditioning phase to the test phase, as well as from the conditioning phase to the follow-up phase. Furthermore, to directly and simply depict these changes, in addition to dot plots, we have also represented them with line charts (Figs. 2C, 2H, 4C, 4H, 5C, 5H). To estimate the sample size, we had previously performed a power analysis through G*Power 3.1.9.2 and it had resulted in n = 21 per experimental group. However, by correcting data pre-processing procedures (in accordance with Reviewer 1), we obtained data that were not normally distributed. Thus, we reasoned to enlarge our sample width by re-performing a power analysis (with the new suggested statistical analyses) and then repeating the experiments. For the main statistics, i.e. mixed ANOVA (within-between interaction) with two groups and two measurements, with the following input parameters: α equal to 0.05, power (1-β) equal to 0.95, and a hypothesized effect size (f) equal to 0.25, the new estimated sample size resulted in n = 30 per experimental group.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the Authors implement a delayed feedback control method and use it for the first time in biological neuronal networks. They extend a well-established computational theory and expand it into the biological realm. With this, they obtain novel evidence, never considered before, that showcases the difference between simulated neuronal networks and biological ones. Furthermore, they optimize the DFC method to achieve optimal results in the control of cell excitability in the content of biological neuronal networks, taking advantage of a closed-loop stimulation setup that, by itself, is not trivial to build and operate and that will certainly have a positive impact the fields of cellular and network electrophysiology.

      Regarding the results, it would be very constructive if the Authors could share the code for the quasi-real-time interface with the Multichannel Systems software (current and older hardware versions), as this represents likely a bottleneck preventing more researchers to implement such an experimental paradigm.

      On the data focusing on the effects of the DFC algorithms on neuronal behavior, the evidence is very compelling, although more care should be devoted to the statistical analyses, since some of the applied statistical tests are not appropriate. In a more biological sense, further discussion and clarification of the experimental details would improve this manuscript, making it more accessible and clearer for researchers across disciplines (i.e., ranging from computational to experimental Neuroscience) and increasing the impact of this research.

      In summary, this work represents a necessary bridge between recent advances in computational neuroscience and the biological implementation of neuronal control mechanisms.

      Regarding sharing the control code, our application for closed-loop stimulation using aDFC, DFC and Poisson is now available in GitHub (https://github.com/NCN-Lab/aDFC). This was, in fact, our initial intention following the reviewing process. With this application, the user can run the developed algorithms with the MEA2100-256 System from Multi Channel Systems MCS GmbH.

      Same with the data. The dataset with the spike data from all experiments is also now publicly available in Zenodo. The data can be found in https://doi.org/10.5281/zenodo.10138446.

      Regarding the improvements in the statistical analysis, the tests are now performed following Reviewer #1 suggestions. Important to emphasize that this did not change the results/ conclusions of the work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We want to thank the Editor and Reviewers for their thorough assessment of the manuscript as well as their constructive critiques. We have collated below the public review and recommendations from each Reviewer as well as our responses to them.

      eLife assessment

      This study by Verdikt et al. provided solid evidence demonstrating the potential impacts of Δ9-tetrahydrocannabinol (Δ9-THC) on early embryonic development using mouse embryonic stem cells (mESCs) and in vitro differentiation. Their results revealed that Δ9-THC enhanced mESCs proliferation and metabolic adaptation, possibly persisting through differentiation to Primordial Germ Cell-Like Cells (PGCLCs), though the evidence supporting this persistence was incomplete. Although the study is important, it was limited by being conducted solely in vitro and lacking parallel human model experiments.

      Reviewer #1 (Public Review):

      The authors investigated the metabolic effects of ∆9-THC, the main psychoactive component of cannabis, on early mouse embryonic cell types. They found that ∆9-THC increases proliferation in female mouse embryonic stem cells (mESCs) and upregulates glycolysis. Additionally, primordial germ cell-like cells (PGCLCs) differentiated from ∆9-THC-exposed cells also show alterations to their metabolism. The study is valuable because it shows that physiologically relevant ∆9-THC concentrations have metabolic effects on cell types from the early embryo, which may cause developmental effects. However, the claim of "metabolic memory" is not justified by the current data, since the effects on PGCLCs could potentially be due to ∆9-THC persisting in the cultured cells over the course of the experiment, even after the growth medium without ∆9-THC was added.

      The study shows that ∆9-THC increases the proliferation rate of mESCs but not mEpiLCs, without substantially affecting cell viability, except at the highest dose of 100 µM which shows toxicity (Figure 1). Treatment of mESCs with rimonabant (a CB1 receptor antagonist) blocks the effect of 100 nM ∆9-THC on cell proliferation, showing that the proliferative effect is mediated by CB1 receptor signaling. Similarly, treatment with 2-deoxyglucose, a glycolysis inhibitor, also blocks this proliferative effect (Figure 4G-H). Therefore, the effect of ∆9-THC depends on both CB1 signaling and glycolysis. This set of experiments strengthens the conclusions of the study by helping to elucidate the mechanism of the effects of ∆9-THC.

      Although several experiments independently showed a metabolic effect of ∆9-THC treatment, this effect was not dose-dependent over the range of concentrations tested (10 nM and above). Given that metabolic effects were observed even at 10 nM ∆9-THC (see for example Figure 1C and 3B), the authors should test lower concentrations to determine the dose-dependence and EC50 of this effect. The authors should also compare their observed EC50 with the binding affinity of ∆9-THC to cellular receptors such as CB1, CB2, and GPR55 (reported by other studies).

      The study also profiles the transcriptome and metabolome of cells exposed to 100 nM ∆9-THC. Although the transcriptomic changes are modest overall, there is upregulation of anabolic genes, consistent with the increased proliferation rate in mESCs. Metabolomic profiling revealed a broad upregulation of metabolites in mESCs treated with 100 nM ∆9-THC.

      Additionally, the study shows that ∆9-THC can influence germ cell specification. mESCs were differentiated to mEpiLCs in the presence or absence of ∆9-THC, and the mEpiLCs were subsequently differentiated to mPGCLCs. mPGCLC induction efficiency was tracked using a BV:SC dual fluorescent reporter. ∆9-THC treated cells had a moderate increase in the double positive mPGCLC population and a decrease in the double negative population. A cell tracking dye showed that mPGCLCs differentiated from ∆9-THC treated cells had undergone more divisions on average. As with the mESCs, these mPGCLCs also had altered gene expression and metabolism, consistent with an increased proliferation rate.

      My main criticism is that the current experimental setup does not distinguish between "metabolic memory" vs. carryover of THC (or its metabolites) causing metabolic effects. The authors assume that their PGCLC induction was performed "in the absence of continuous exposure" but this assumption may not be justified. ∆9-THC might persist in the cells since it is highly hydrophobic. In order to rule out the persistence of ∆9-THC as an explanation of the effects seen in PGCLCs, the authors should measure concentrations of ∆9-THC and THC metabolites over time during the course of their PGCLC induction experiment. This could be done by mass spectrometry. This is particularly important because 10 nM of ∆9-THC was shown to have metabolic effects (Figure 1C, 3B, etc.). Since the EpiLCs were treated with 100 nM, if even 10% of the ∆9-THC remained, this could account for the metabolic effects. If the authors want to prove "metabolic memory", they need to show that the concentration of ∆9-THC is below the minimum dose required for metabolic effects.

      Overall, this study is promising but needs some additional work in order to justify its conclusions. The developmental effects of ∆9-THC exposure are important for society to understand, and the results of this study are significant for public health.

      *Reviewer #1 (Recommendations For The Authors):

      This has the potential to be a good study, but it's currently missing two key experiments:

      What is the minimum dose of ∆9-THC required to see metabolic effects?

      We would like to thank Reviewer 1 for their insightful comments. We have included exposures to lower doses of ∆9-THC in Supplementary Figure 1. Our data shows that ∆9-THC induces mESCs proliferation from 1nM onwards. However, when ESCs and EpiLCs were exposed to 1nM of ∆9-THC, no significant change in mPGCLCs induction was observed (updated Figure 6B). Of note, in their public review, Reviewer 1 mentioned that “The authors should also compare their observed EC50 with the binding affinity of ∆9-THC to cellular receptors such as CB1, CB2, and GPR55 (reported by other studies).” According to the literature, stimulation of non-cannabinoid receptors and ion channels (including GPR18, GPR55, TRPVs, etc.) occurs at 40nM-10µM of ∆9-THC (Banister et al., 2019). We therefore expect that at the lower nanomolar range tested, CB1 is the main receptor stimulated by ∆9-THC, as we showed for the 100nM dose in our rimonabant experiments (Fig. 2).

      Is the residual THC concentration during the PGCLC induction below this minimum dose? Even if the effects are due to residual ∆9-THC, this would not undermine the overall study. There would simply be a different interpretation of the results.

      This experiment was particularly important to distinguish between a “true” ∆9-THC metabolic memory or residual ∆9-THC leftover during PGCLCs differentiation. Our mass spectrometry quantification revealed that no significant ∆9-THC could be detected in day 5 embryoid bodies compared to treated EpiLCs prior to differentiation (Supplementary Figure 13). These results support the existence of ∆9-THC metabolic memory across differentiation.

      You also do not mention whether you tested your cells for mycoplasma. This is important since mycoplasma contamination is a common problem that can cause artifactual results. Please test your cells and report the results.

      All cells were tested negative for mycoplasma by a PCR test (ATCC® ISO 9001:2008 and ISO/IEC 17025:2005 quality standards). This information has been added in the Material and Methods section.

      Minor points:

      1. I don't think it's correct to say that cannabis is the most commonly used psychoactive drug. Alcohol and nicotine are more commonly used. See: https://nida.nih.gov/research-topics/alcohol and https://www.cancer.gov/publications/dictionaries/cancer-terms/def/psychoactive-substance I looked at the UN drugs report [ref 1] and alcohol or nicotine were not included on that list of drugs, so the UN may use a different definition. This doesn't affect the importance or conclusions of this study, but the wording should be changed.

      We agree and are now following the WHO description of cannabis (https://www.who.int/teams/mental-health-and-substance-use/alcohol-drugs-and-addictive-behaviours/drugs-psychoactive/cannabis) by referring to it as the “most widely used illicit drug in the world”. (Line 44).

      1. It would be informative to use your RNA-seq data to examine the expression of receptors for ∆9-THC such as CB1, CB2, and GPR55. CB1 might be the main one, but I am curious to see if others are present.

      We have explored the protein expression of several cannabinoid receptors, including CB2, GPR18, GPR55 and TRPV1 (Bannister et al., 2019). These proteins, except TRPV1, were lowly expressed in mouse embryonic stem cells compared to the positive control (mouse brain extract, see Author response image 1). Furthermore, our experiment with Rimonabant showed that the proliferative effects of ∆9-THC are mediated through CB1.

      Author response image 1.

      Cannabinoid receptors and non-cannabinoid receptors protein expression in mouse embryonic stem cells.

      1. Make sure to report exact p-values. You usually do this, but there are a few places where it says p<0.0001. Also, report whether T-tests assumed equal variance (Student's) or unequal variance (Welch's). [In general, it's better to use unequal variance, unless there is good reason to assume equal variance.]

      Prism, which was used for statistical analyses, only reports p-values to four decimal places. For all p-values that were p<0.0001, the exact decimals were calculated in Excel using the “=T.DIST.2T(t, df)” function, where the Student’s distribution and the number of degrees of freedom computed by Prism were inputted. Homoscedasticity was confirmed for all statistical analyses in Prism.

      1. Figure 2A: An uncropped gel image should be provided as supplementary data. Additionally, show positive and negative controls (from cells known to either express CB1 or not express CB1)

      The uncropped gel image is presented in Author response image 2. The antibody was validated on mouse brain extracts as a positive control as shown in Figure 1.

      Author response image 2.

      Uncropped gel corresponding to Fig. 2A where an anti-CB1 antibody was used.

      1. Figure 6B: Please show a representative gating scheme for flow cytometry (including controls) as supplementary data. Also, was a live/dead stain used? What controls were used for compensation? These details should be reported.

      The gating strategy is presented in Supplementary Figure 11. The Material and Methods section has also been expanded.

      1. As far as I can tell, you only used female mESCs. It would be good to test the effects on male mESCs as well since these have some differences due to differences in X-linked gene expression (female mESCs have two active X chromosomes). I understand that you might not have a male BV:SC reporter line, so it would be acceptable to omit the mPGCLC experiments on male cells.

      We have tested the 10nM-100µM dose range in the male R8 mESCs (Supplementary Figure 3). Similar results as with the female H18 cells were observed. Accordingly, PGCLCs induction was increased when R8 ESCs + EpiLCs were exposed to 100nM of ∆9-THC (Supplementary Figure 12). This is in line with ∆9-THC impact on fundamentally conserved metabolic pathways across species and sex, although it should be noted that one representative model of each sex is not sufficient to exclude sex-specific effects.

      Reviewer #2 (Public Review):

      In the study conducted by Verdikt et al, the authors employed mouse Embryonic Stem Cells (ESCs) and in vitro differentiation techniques to demonstrate that exposure to cannabis, specifically Δ9-tetrahydrocannabinol (Δ9-THC), could potentially influence early embryonic development. Δ9-THC was found to augment the proliferation of naïve mouse ESCs, but not formative Epiblast-like Cells (EpiLCs). This enhanced proliferation relies on binding to the CB1 receptor. Moreover, Δ9-THC exposure was noted to boost glycolytic rates and anabolic capabilities in mESCs. The metabolic adaptations brought on by Δ9-THC exposure persisted during differentiation into Primordial Germ Cell-Like Cells (PGCLCs), even when direct exposure ceased, and correlated with a shift in their transcriptional profile. This study provides the first comprehensive molecular assessment of the effects of Δ9-THC exposure on mouse ESCs and their early derivatives. The manuscript underscores the potential ramifications of cannabis exposure on early embryonic development and pluripotent stem cells. However, it is important to note the limitations of this study: firstly, all experiments were conducted in vitro, and secondly, the study lacks analogous experiments in human models.

      Reviewer #2 (Recommendations For The Authors):

      1. EpiLCs, characterized as formative pluripotent stem cells rather than primed ones, are a transient population during ESC differentiation. The authors should consider using EpiSCs and/or formative-like PSCs (Yu et al., Cell Stem Cell, 2021; Kinoshita et al., Cell Stem Cell, 2021), and amend their references to EpiLCs as "formative".

      Indeed, EpiLCs are a transient pluripotent stem cell population that is “functionally distinct from both naïve ESCs and EpiSCs” and “enriched in formative phase cells related to pre-streak epiblast” (Kinoshita et al., Cell Stem Cell, 2021). Here, we used the differentiation system developed by M. Saitou and colleagues to derive PGCLCs (Hayashi et al, 2011). Since EpiSCs are refractory to PGCLCs induction (Hayashi et al, 2011), we used the germline-competent EpiLCs and took advantage of a well-established differentiation system to derive mouse PGCLCs. Most authors, however, agree that in terms of epigenetic and metabolic profiles, mouse EpiLCs represent a primed pluripotent state. We have added that PGCs arise in vivo “from formative pluripotent cells in the epiblast” on lines 85-86.

      1. Does the administration of Δ9-THC, at concentrations from 10nM to 1uM, alter the cell cycle profiles of ESCs?

      The proliferation of ESCs was associated with changes in the cell cycle, as presented in the new Supplementary Figure 2, which we discuss in lines 118-123.

      1. Could Δ9-THC treatment influence the differentiation dynamics from ESCs to EpiLCs?

      No significant changes were observed in the pluripotency markers associated with ESCs and EpiLCs (Supplementary Figure 9). We have added this information in lines 277-279.

      1. The authors should consider developing knockout models of cannabinoid receptors in ESCs and EpiLCs (or EpiSCs and formative-like PSCs) for control purposes.

      This is an excellent suggestion. Due to time and resource constraints, however, we focused our mechanistic investigation of the role of CB1 on the use of rimonabant which revealed a reversal of Δ9-THC-induced proliferation at 100nM.

      1. Lines 134-136: "Importantly, SR141716 pre-treatment, while not affecting cell viability, led to a reduced cell count compared to the control, indicating a fundamental role for CB1 in promoting proliferation." Regarding Figure 2D, does the Rimonabant "+" in the "mock" group represent treatment with Rimonabant only? If that's the case, there appears to be no difference from the Rimonabant "-" mock. The authors should present results for Rimonabant-only treatment.

      To be able to compare the effects +/- Rimonabant and as stated in the figure legend, each condition was normalized to its own control (mock with, or without Rimonabant). Author response image 3 is the unnormalized data showing the same effects of Δ9-THC and Rimonabant on cell number.

      Author response image 3.

      Unnormalized data corresponding to the Figure 2D.

      1. In Figure 3, both ESCs and EpiLCs show a significant decrease in oxygen consumption and glycolysis at a 10uM concentration. Do these conditions slow cell growth? BrdU incorporation experiments (Figure 1) seem to contradict this. With compromised bioenergetics at this concentration, the authors should discuss why cell growth appears unaffected.

      Indeed, we believe that cell growth is progressively restricted upon increasing doses of ∆9-THC (consider Supplementary Figure 2). In addition, oxygen consumption and glycolysis can be decoupled from cellular proliferation, especially considering the lower time ranges we are working with (44-48h).

      1. Beyond Δ9-THC exposure prior to PGCLCs induction, it would be also interesting to explore the effects of Δ9-THC on PGCLCs during their differentiation.

      We agree with the Reviewer. Our aim was to study whether exposure prior to differentiation could have an impact, and if so, what are the mediators of this impact. Full exposure during differentiation is another exposure paradigm that is relevant but would not have allowed us to show the metabolic memory of ∆9-THC exposure. Future work, however, will be dedicated to analyzing the effect of continuous exposure through differentiation.

      1. As PGC differentiation involves global epigenetic changes, it would be interesting to investigate how Δ9-THC treatment at the ESCs/EpiLCs stage may influence PGCLCs' transcriptomes.

      We also agree with the Reviewer. While this paper was not primarily focused on Δ9-THC’s epigenetic effects, we have explored the impact of Δ9-THC on more than 100 epigenetic modifiers in our RNA-seq datasets. These results are shown in Supplementary Table 1 and Supplementary Figure 10 and discussed in lines 301-316.

      1. Lines 407-408: The authors should exercise caution when suggesting "potentially adverse consequences" based solely on moderate changes in PGCLCs transcriptomes.

      We agree and have modified the sentence as follows: “Our results thus show that exposure to Δ9-THC prior to specification affects embryonic germ cells’ transcriptome and metabolome. This in turn could have adverse consequences on cell-cell adhesion with an impact on PGC normal development in vivo.“

      1. Investigating the possible impacts of Δ9-THC exposure on cultured mouse blastocysts, implantation, post-implantation development, and fertility could yield intriguing findings.

      We thank the Reviewer for this comment. We have amended our discussion to include these points in the last paragraph.

      1. Given that naïve human PSCs and human PGCLCs differentiation protocols have been established, the authors should consider carrying out parallel experiments in human models.

      We have performed Δ9-THC exposures in hESCs (Supplementary Figure 4 and Supplementary Figure 5), showing that Δ9-THC alters the cell number and general metabolism of these cells. We present these results in light of the differences in metabolism between mouse and human embryonic stem cells on lines 135-141 and 185-188. Implications of these results are discussed in lines 474-486.

      Reviewer #3 (Public Review):

      Verdikt et al. focused on the influence of Δ9-THC, the most abundant phytocannabinoid, on early embryonic processes. The authors chose an in vitro differentiation system as a model and compared the proliferation rate, metabolic status, and transcriptional level in ESCs, exposure to Δ9-THC. They also evaluated the change of metabolism and transcriptome in PGCLCs derived from Δ9-THC-exposed cells. All the methods in this paper do not involve the differentiation of ESCs to lineage-specific cells. So the results cannot demonstrate the impact of Δ9-THC on preimplantation developmental stages. In brief, the authors want to explore the impact of Δ9-THC on preimplantation developmental stages, but they only detected the change in ESCs and PGCLCs derived from ESCs, exposure to Δ9-THC, which showed the molecular characterization of the impact of Δ9-THC exposure on ESCs and PGCLCs.

      Reviewer #3 (Recommendations For The Authors):

      1. To demonstrate the impact of Δ9-THC on preimplantation developmental stages, ESCs are an appropriate system. They have the ability to differentiate three lineage-specific cells. The authors should perform differentiation experiments under Δ9-THC-exposure, and detect the influence of Δ9-THC on the differentiation capacity of ESCs, more than just differentiate to PGCLCs.

      We apologize for the lack of clarity in our introduction. We specifically looked at the developmental trajectory of PGCs because of the sensitivity of these cells to environmental insults and their potential contribution to transgenerational inheritance. We have expanded on these points in our introduction and discussion sections (lines 89-91 and 474-486). Because our data shows the relevance of Δ9-THC-mediated metabolic rewiring in ESCs subsisting across differentiation, we agree that differentiation towards other systems (neuroprogenitors, for instance) would yield interesting data, albeit beyond the scope of the present study.

      1. Epigenetics are important to mammalian development. The authors only detect the change after Δ9-THC-exposure on the transcriptome level. How about methylation landscape changes in the Δ9-THC-exposure ESCs?

      We have explored the impact of Δ9-THC on more than 100 epigenetic modifiers in our RNA-seq datasets. These results are shown in Supplementary Table 1 and Supplementary Figure 10, discussed in lines 301-316. While indeed the changes in DNA methylation profiles appear relevant in the context of Δ9-THC exposure (because of Tet2 increased expression in EpiLCs), we highlight that other epigenetic marks (histone acetylation, methylation or ubiquitination) might be relevant for future studies.

      1. In the abstract, the authors claimed that "the results represent the first in-depth molecular characterization of the impact of Δ9-THC exposure on preimplantation developmental stages." But they do not show whether the Δ9-THC affects the fetus through the maternal-fetal interface.

      We have addressed the need for increased clarity and have modified the sentence as follows: “These results represent the first in-depth molecular characterization of the impact of Δ9-THC exposure on early stages of the germline development.”

      1. To explore the impact of cannabis on pregnant women, the human ESCs may be a more proper system, due to the different pluripotency between human ESCs and mouse ESCs.

      We have performed Δ9-THC exposures in hESCs (Supplementary Figure 4 and Supplementary Figure 5). These preliminary results show that Δ9-THC exposure negatively impacts the cell number and general metabolism of hESCs. With the existence of differentiation systems for hPGCLCs, future studies will need to assess whether Δ9-THC-mediated metabolic remodelling is also carried through differentiation in human systems. We discuss these points in the last paragraph of our discussion section.

      1. All the experiments are performed in vitro, and the authors should validate their results in vivo, at least a Δ9-THC-exposure pregnant mouse model.

      Our work is the first of its kind to show that exposure to a drug of abuse can alter the normal development of the embryonic germline. We agree with the Reviewer that to demonstrate transgenerational inheritance of the effects reported here, future experiments in an in vivo mouse model should be conducted. The metabolic remodeling observed upon cannabis exposure could also be directly studied in a human context, although these experiments would be beyond the scope of the present study. For instance, changes in glycolysis may be detected in pregnant women using cannabis, or directly measured in follicular fluid in a similar manner as done by Fuchs-Weizman and colleagues (Fuchs-Weizman et al., 2021). We hope that our work can provide the foundation to inform such in vivo studies.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Grove and colleagues analyzes the role of TEAD1 transcription factors in all events regulating PNS myelin formation and maintenance and regeneration. Throughout the manuscript, the authors compare the results obtained to those they previously described in YAP/TAZ double knockout mice. Strengths of the manuscript are combined in vivo analyses by generating mutants constitutively lacking TEAD1 expression in myelinating Schwann cells (P0Cre//TEAD1f/f mice: cKO) and mutants in which TEAD1 expression can be ablated after tamoxifen-mediated recombination is myelinating Schwann cells (PlpCreER//TEAD1f/f mice: iKO). Using this approach the authors were able to assess the role of TEAD1 in all aspects related to PNS myelin: formation as well as maintenance and remyelination after injury. By exploiting these models, they were able to define the role of TEAD1 in regulating Schwann cell proliferation as well as in the cholesterol biosynthetic pathway. Collectively, their data indicate that TEAD 1 has a composite role in PNS myelination being required for developmental myelination, but dispensable for myelin maintenance. Further, they also describe a role for TEAD1 in promoting PNS remyelination after an injury event.

      Despite these strengths, there are some weaknesses that should be addressed by the authors:

      1) The manuscript would benefit from better and more detailed analysis of the role of the other TEAD transcription factors, as they are likely redundant in function to TEAD1. For example, since in cKO mice some fibers can escape the sorting defect and eventually myelinate, albeit at a lower level, could they determine whether TEAD2-4 transcription factors might compensate for TEAD1 absence in this setting?

      We speculate that other TEADs, most likely both TEAD2 and TEAD3, compensate TEAD1 in myelinating some developing axons. We also speculate that TEAD4 counteracts TEAD1, resulting in excessive proliferation of Schwann cells in Tead1 cKO. Unfortunately, because, unlike TEAD1, floxed/congenic alleles and IHC-compatible antibodies are not yet available for TEAD2-4, it is difficult to determine their roles. We attempted to knock down TEAD2-4 by injecting AAV-shRNAs into the sciatic nerves of WT and Tead1 iKO, but this intervention was not successful. Our future studies will determine compensatory and/or opposing roles of other TEADs during development and homeostasis and after nerve injury.

      2) A striking result of the study is the morphological defects observed in the process of axonal sorting and in the Remak fibers formation of TEAD1 cKO mice. To explain the sorting defect, the authors correctly analyze Schwann cell proliferation. However, since axonal sorting is mediated by the interaction between the extracellular matrix and intracellular cytoskeleton rearrangement, they should address also these two aspects. As per the Remak bundles and the poly-axonal myelination they observe, it is difficult to reconcile this "abnormal" myelination with the fact that TEAD1 cKO mice have a very severe myelinating phenotype, which is persistent in adulthood.

      It is noteworthy that we found radial sorting to be delayed, but not blocked, in Tead1 cKO, as we had previously reported for Yap/Taz cDKO mice in our earlier publication (Grove et al., eLIFE 2017). The primary reason that myelin development fails in Schwann cells lacking YAP/TAZ (or TEAD1 in the present report) is because they do not initiate myelination of sorted axons, not because of defective radial sorting. We showed that radial sorting was delayed in Schwann cells lacking YAP/TAZ because of their late S phase entry (Figure 4 in Grove et al., eLIFE 2017). In addition, our earlier report demonstrated that the key laminin receptor, integrin 6, is strongly downregulated but axons are nevertheless sorted out by Schwann cells in Yap/Taz cDKO (Figure 4-figure supplement 2 in Grove et al., eLIFE 2017). Our current view, therefore, is that extracellular matrix may contribute to reducing Schwann cell proliferation (Berti et al., 2011; Pellegatta et al., 2013; Yu, Feltri, Wrabetz, Strickland, & Chen, 2005), which helps to delay radial sorting, but that it is not required for Schwann cells lacking YAP/TAZ (or TEAD1) to sort axons (see the author response #2 in Grove et al., eLIFE 2017). Based on this information, we disagree with the reviewer that it is essential for us to address the role of extracellular matrix in delaying radial sorting in Tead1 cKO.

      Regarding Remak bundles, ‘thinly’ myelinated Remak bundles are only ‘occasionally’ observed in Tead1 cKO mice. Given that some large axons are still myelinated in Tead1 cKO mice, likely due to compensation by other TEADs, we speculate that Remak bundles are occasionally myelinated by other TEADs in Tead1 cKO. We have clarified our description and expanded our discussion of TEAD1 regulation of Remak bundles, including abnormal polyaxonal myelination.

      3) In the analyses of the cholesterol biosynthetic pathway, TEAD1 seems to be only partly involved. Again, which is the role of any of the other TEADs?

      Examining cholesterol biosynthesis pathways (SREBP1 and 2) and their target enzymes (SCD1, HMGCR, FDPS, IDI1) in Tead1 cKO and Yap/Taz cDKO, we showed that TEAD1 is required for upregulating FDPS and IDI1. These data suggest that TEAD1 plays a major role in mediating YAP/TAZ-driven cholesterol synthesis by upregulating FDPS and IDI1. It is also important to note that FDPS and IDI1 levels are reduced in TEAD1 cKO as ‘greatly’ as those in Yap/Taz cDKO (Figure 5). We therefore speculate that other TEADs compensate TEAD1 modestly, if at all, in upregulating FDPS and IDI1. We do not rule out the possibility, however, that other TEADs fully compensate TEAD1 in ‘maintaining’ cholesterol synthesis in adult Schwann cells. We will address these important questions in the future when the key resources mentioned above become available to study TEAD2-4.

      4) Why do cKO mice die before P60?

      In accordance with IACUC guidelines, we humanely euthanized Tead1 cKO mice before P60 because, like Yap/Taz cKO mice, they develop severe peripheral neuropathy.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the reviewer for the positive evaluation of our manuscript. We have closely examined the issues raised, and below we offer a point-by-point response to each comment. In the revised manuscript below, all the introduced changes are marked with red font.

      1. There may be a general typo concerning micromolar and millimolar…

      Response 1: The reviewer is correct, and during the reformatting of the manuscript, in some portions of the manuscript, the units used to indicate TPEN concentrations, always µM, were switched to mM. We have corrected those mistakes.

      1. In Figure 1C/Lines 150-152, the authors use DTPA and EDTA as extracellular chelators for zinc… Was the amount of zinc in the media measured and determined to be below the amount of chelator used? Additionally, these chelators are not specific for zinc, but can bind other divalent cations including calcium. Even though zinc binds more tightly than calcium to these chelators, by mass action calcium and magnesium ions may outcompete DTPA and EDTA, leaving zinc availability unperturbed. How do the authors take these interactions into account to determine that chelation of extracellular zinc has no effect on intracellular calcium oscillations? The best way to test this is to use zinc responsive fluorescent probes in a sample of the calcium- and magnesium-replete medium and see if the addition of the DTPA or EDTA alters zinc fluorescence in the cuvette.

      Response 2: We tested several conditions to determine the effect of chelators on the zinc concentration of the monitoring media using commercially available Zn2+ probes. The fluorescent zinc probe FluoZin3 added extracellularly shows high fluorescence, consistent with trace amounts of zinc and possibly non-specific bindings of other cations.

      Further, the media tested was replete with the concentrations of Ca2+ and Mg2+ in TLHEPES. To establish if the non-permeable external chelators we used could bind external Zn2+ despite the high concentrations of Ca2+ and Mg2+, we followed the reviewer’s suggestion of adding the chelators to the complete media in the presence of FluoZin3. The addition of EDTA caused a protracted, ~5 min, but significant decrease in FluoZin3’s fluorescence, suggesting it is effective at removing external Zn2+ despite the presence of other divalent cations (Author response image 1A). We used a second approach where we added the chelator in the presence of nominal concentrations of Ca2+ and Mg2+ to increase the chelators’ chances to find and chelate Zn2+ (Author response image 1B). Then, we injected mPlcζ mRNA, which initiated persistent but low-frequency oscillations, as expected due to the lack of external Ca2+. Remarkably, upon restoring it, the responses became of high frequency, and upon increasing Mg2+, they acquired the regular pattern, consistent with Mg2+’s inhibition of channels that mediate Ca2+ influx. These results show that the chelation of extracellular zinc does not replicate TPEN’s effect, which suggests that TPEN’s abrupt and inhibiting ability on Ca2+ oscillations is most likely due to the 43 chelation of internal Zn2+.

      Author response image 1.

      Cell-impermeable chelators effectively reduce Zn2+ levels in external media but do prevent initiation or continuation of Ca2+ oscillations. (A) A representative trace of FluoZin3 fluorescence in replete monitoring media (TL-HEPES). The media was supplemented with cell-impermeable FluoZin-3, and after initiation of monitoring, the addition of EDTA (100 μM) occurred at the designated point (triangle). (B) The left black trace represents Ca2+ oscillations initiation by injection of mPlcζ mRNA (0.01 μg/μl). The oscillations were monitored in Ca2+ and Mg2+-free media and in the presence of EDTA (110 μM) to chelate residual divalent cations derived from the water source or reagents used to make the media. The right red trace represents the initiation of oscillations as above, but after a period indicated by the black and green bars, Ca2+ and Mg2+ were sequentially added back.

      Noteworthy, low EDTA concentrations, 10-µM, have been used to enhance in vitro culture conditions of mammalian embryos. In fact, it is the key ingredient to overcome the two-cell block that initially prevented the in vitro development of zygotes srom inbred strains. It is unknown how EDTA mediates this effect, which is detectable in Ca2+ and Mg2+ replete media and is only effective when placed extracellularly, but it has been attributed to its ability to chelate toxic metals introduced as impurities by other media components; one study demonstrated that the Zn2+ present in the oil used to overlay the culture medium micro drops was the target (Erbach et al., Human Reproduction, 1995, 10, 3248-54). We included some of these points in the revised version of the manuscript and added this figure as Supplementary Figure 1.

      1. The reviewer noted that while dKO eggs showed reduced labile zinc levels, the amount of total zinc is not determined. Further, the response to thapsigargin in dKO eggs didn’t phenocopy the profile in eggs treated with TPEN. The reviewer argued that without further experimentation, such as comparing polar body extrusion and egg activation rate between WT and dKO, it seems to be a stretch to state that these eggs are zinc deficient.

      Response 3: We agree that the statement, ‘zinc deficient,’ is an overstatement without determining the total zinc levels and associated phenotypes. Therefore, in the revised version of the manuscript, we referred to dKO-derived eggs and embryos as “low-level labile Zn2+ eggs”. Our follow-up studies show that eggs from dKO females seem to undergo egg activation events, such as the timing and rate of second polar body extrusion and pronuclear formation, with a similar dynamic to WT females. Hence, we estimate that the labile Zn2+ levels in dKO eggs are not as low as those of WT eggs treated with TPEN. Consequently, these intermediate zinc levels may have subtle effects, such as changing the Thapsigargin-induced Ca2+ release through the IP3R1 without causing widespread inhibition of cellular events observed after TPEN. We would argue that this approach is significant because it can distinguish how the different cellular events and proteins and enzymes have distinct affinities or zinc requirements and, in this case, start uncovering the channel(s) present in oocytes and eggs that may contribute to regulating zinc homeostasis.

      1. The reviewer pointed out that since zinc is not redox active, it is unclear how zinc could be modifying cysteine residues of IP3R1.The reviewer suggested the possibility that excess zinc is binding to the cysteines and preventing their oxidation leading to the inhibition of the IP3R1 by blocking the channel, thereby preventing calcium release.

      Response 4: The reviewer correctly points out that the mechanism(s) whereby excess Zn2+ modifies the IP3R1 function is undetermined in our study. Further, our description of ‘modifying’ is ambiguous and could be misinterpreted. Data in the literature, some of which we cite in the manuscript, shows that “oxidation of cysteine residues enhances receptor’s sensitivity to ligands in various cell types”. Zn2+ preferentially binds to reduced cysteine residues, and thus, we agree with the proposed reviewer's suggestion that “excess zinc may occupy reduced cysteine residues, preventing their oxidization required to sensitize the receptor”. As noted by the reviewer, we cannot rule out that it might be directly blocking the IP3R1 channel. We have modified the corresponding paragraphs in the Discussion.

      1. Line 80 and 411, there are three other reports demonstrate the zinc reallocation to the egg shell or ejection as the zinc spark; Zebrafish: Converse et al. in Sci. Reports 10, 15673 (2020); X. lavis: Seeler et al. in Nature Chem. 13, 683-691 (2021), C. elegans: Mendoza et al. in Biology of Reproduction 107(2):406-418 (2022).

      Response 5: Thank you for pointing this out, and we have added these references.

      1. Line 129, when discussing that Zn2+ concentrations are reduced after TPEN as visualized by FluoZin-3, the authors should cite the article in which FluoZin-3 was first reported and this result was demonstrated initially: "Detection and Imaging of Zinc Secretion from Pancreatic β-Cells Using a New Fluorescent Zinc Indicator" by Gee et al. J. Am. Chem. Soc 124, 5, 776-778.

      Response 6: Thank you for pointing this out, and we have added this reference.

      1. In Figure 1E/Table 1 the authors evaluated if TPEN supplementation affects meiosis and pronuclear formation; however, the timing of TPEN treatment is unclear. When was TPEN introduced? Were the eggs left in the same media containing TPEN following fertilization, or were they transferred to different media?

      Response 7: Thank you for pointing this out, and we have noted the time of the addition in the figure and text.

      1. Line 1011 and 1012, ZnTP should be ZnPT.

      Response 8: Thank you for pointing this out, which is now corrected.

      Reviewer #2:

      1. The reviewer raises the question of whether a more complex relationship could exist between the levels of zinc in MII eggs by indicating, “a more active relationship such that zinc efflux associated with each calcium spike could be necessary for terminating the Ca spike by depleting cytoplasmic zinc.” The reviewer also states, “Perhaps, rather than simply a permissive role, the normal Zn fluxes during activation may be acutely changing IP3-R gating sensitivity.”

      Response 1: We agree that the demonstration that TPEN dose-dependently delays and consistently terminates ongoing Ca2+ rises perhaps reflects a more nuanced relationship between cytoplasmic labile zinc concentrations, Ca2+ oscillations, and IP3R1 function. Uncovering the precise nature of this relationship would require additional studies, such as determining the impact of TPEN on IP3 binding to its cognate receptor, regulation of channel gating, and more in-depth functional-structural experiments. However, these studies will demand time and complex experimental design and are beyond the scope of the current work. Nevertheless, they are excellent suggestions for future studies.

      We would argue against the reviewer’s suggestion that “zinc sparks directly contribute to shaping the oscillations.” Zn2+ released during the sparks is not labile, but Zn2+ bound to cortical granules-resident proteins, most of which are inaccessible to the cytosol and hence to IP3R1s and should not perturb its function. We examined (data not shown) that the levels of cytosolic labile Zn2+, as assessed with FluoZin3, remained steady for over three hours of Plcζ mRNA-initiated oscillations. Further, because the Zn2+ sparks cease after the third or fourth Ca2+ rise, it would mean, at the very least, that this mechanism only operates on the first few responses. Thus, while the change of cytosolic Ca2+ concentrations triggers the Zn2+ sparks, we argue that the opposite influence is unlikely to hold true.

      1. The reviewer also pointed out that the role of Trpv3 and Trpm7 in Zn2+ homeostasis seems to be minor and that the effects of genetic deletion of those channels are not as clear as those obtained by TPEN. Given that dKO eggs make it to the MII and release more but not less calcium upon thapsigargin than control despite the lowered labile Zn2+ level, the reviewer speculated that the loss of those channels changes calcium gating independent of Zn2+ concentration.

      Response 2: TRPV3, TRPM7, and Cav3.2 are the three channels identified to permeate Ca2+ during oocyte maturation and egg activation in mice. We and other groups have observed that in oocytes and eggs, these channels partly compensate for the absence of each other because the deletion of these channels individually has a limited effect on Ca2+ oscillations and fertility. Thus, in the case of oocytes from Trpv3 and Trpm7 dKO animals, the other plasma membrane channel(s), most likely Cav3.2, is plausibly compensating, and its enhanced function underlies the increased Ca2+ response to Thapsigargin.

      Nevertheless, the slower time to the peak and the lesser steep rise of the Thapsigargin induced rise suggest a negative impact of the dKO environment on IP3R1’s ability to mediate Ca2+ release. Based on the rest of the results in the manuscript, we attribute this change to the lower levels of labile Zn2+ in dKO eggs.

      1. Lastly, the reviewer noted the upregulation of the Fura-2AM following addition of ZnPT. The reviewer indicated that 0.05 uM ZnPT might not increase intracellular Zn2+ to change Fura-2 fluorescence, but it might be sufficient Zn2+ to enter the cell and keep the IP3R1 channels open causing a sustained rise in cytoplasmic calcium and preventing oscillations. Further, if this interpretation holds true, the inhibitory effects of high Zn2+ on IP3R1’s gating shown in figure 7 would be precluded.

      Response 3: We acknowledge that the increased levels of Fura-2 fluorescence following the addition of ZnPT could be due to the increased Zn2+ levels acting on IP3R1, increasing its open probability, and elevating cytosolic Ca2+ levels. We have added this consideration to the discussion. Nevertheless, our evidence suggests that this is unlikely because, as shown in Figure 6 H, I, the ER-Ca2+ levels as assessed by D1ER recordings did not change following the addition of ZnPT, whereas Rhod-2 fluorescence did, suggesting that the two events are seemingly uncoupled. Further, constant leak from the ER and extended high cytosolic Ca2+ would lead to egg activation or cell death, neither of which changes were observed.

      Reviewer #3:

      The reviewer noted that the present study deepened the understanding of the role of zinc in regulating calcium channels and stores at fertilization beyond the previously known Zn2+ requirement in oocyte maturation and the cell cycle progression. We appreciate these comments.

      1. Fig. 1. The reviewer wondered why we selected 10 μM TPEN for most of the experiments in the manuscript. The reviewer noted this concentration only stopped the Ca2+oscillations in just half of the eggs after ICSI.

      Response 1: We used 10-μM TPEN throughout the study because it blocked ~50% of the oscillations of a robust trigger of Ca2+ responses such as ICSI and reduced the frequency in the remaining eggs. This concentration of TPEN abrogates and prevents the responses by milder stimuli, such as Acetylcholine and SrCl2. Importantly, thimerosal and Plcζ mRNA overcome the inhibition by 10μM but not 50-μM TPEN. However, 50μM TPEN inactivates Emi2, a Zn2+-dependent enzyme, causing parthenogenic activation and cell cycle progression, and we wanted to avoid this confounding factor. Therefore, we determined 10-μM is a “threshold” concentration and selected it for the remaining studies. We also reasoned that it would allow the detection of more subtle effects of reducing the levels of labile zinc, causing a milder inhibition of IP3R1 sensitivity and a progressive delay or modification of the responses to other agonists rather than fully abrogating them, which is the case with higher concentrations.

      1. Line131 - no concentration of TPEN stated? Or 'the addition of different concentrations of TPEN"?

      Response 2: We have corrected this. We have now added 50-100 µM concentrations.

      1. Line 146 - instead of TPEN, all TPEN concentrations?

      Response 3: We have added these corrections, as at the concentrations we tested here, 5μM TPEN and above, all caused a reduction in the baseline of Fura-2 fluorescence.

      1. Line 1046 - 'We submit'? Propose?

      Response 4: We have replaced the word submit for propose. Thank you for the suggestion.

    1. Author Response

      Reviewer #2 (Public Review):

      In this paper, the authors discover that postsynaptic mitochondria in C. elegans govern glutamate receptor trafficking dynamics. The core results are two-fold. For one, they find that loss or inhibition of mcu-1 - the C. elegans mitochondrial calcium uniporter - increases GLR-1 glutamate receptor accumulation at the postsynaptic dendritic sites and enhances its trafficking dynamics. The authors hypothesize that this effect on glutamate receptors may have something to do with mitochondrial ROS production. This is because ROS is a by-product of normal oxidative phosphorylation, downstream of calcium import. Indeed, the generation of artificially high amounts of mitochondrial ROS has the opposite effect of mcu-1 loss: decreased glutamate receptor subunit accumulation. Collectively, the results support the idea that mitochondrial function can control receptor dynamics at synaptic sites. This is interesting because tight control of synaptic function likely combines several mitochondrial functions: energy production, calcium buffering, and (here) ROS signaling.

      STRENGTHS

      • The C. elegans genetic model is a strength because the authors are able to make refined conclusions by classical loss-of-function mutants (e.g., mcu-1) along with an impressive cytological toolkit to examine GLR-1 dynamics.

      • The use of pharmacology as a second means to test those genetic conclusions is a strength.

      • The authors' careful reagent verification of reporters (Ca2+, ROS, etc.) is a strength.

      • The ability to link fundamental mitochondrial processes to GLR-1 exocytosis will expand how the field thinks about mitochondrial synapse function.

      WEAKNESSES

      For the most part, the data in the paper support the conclusions, and the authors were careful to try experiments in multiple ways. But please see below:

      • (Main Point) The data are good, but they fall short of mechanism (e.g., Line 322). Figure 6 is accurate as drawn. But calcium and ROS are not abstract signals. They are likely exerting affirmative actions on specific targets. The Discussion does acknowledge this in terms of ROS and it speculates on possible targets.

      We thank the reviewer for their analytical review of our manuscript. We agree that all molecular players involved in the proposed mechanism were not identified by the data presented, so we modified the text to remove overstatements. We also agree that Ca2+ and ROS signaling is not abstract. Rather, there are specific and diverse targets of both Ca2+ and ROS signaling. Follow-up experiments are underway to identify and provide evidence for the necessity of potential ROS/Ca2+ targets in this proposed mechanism. For the current manuscript, we have modified our verbiage in an attempt to not mislead or overstate what our results suggest (e.g., changes/additions to the beginning of the ‘Discussion’, lines 365-377 and 385-388) and updated the illustration of the proposed model to include dashed lines that, as mentioned in the figure legend, indicate indirect action by ROS and Ca2+ (see revised Figure 7).

      The general idea seems to be that mitochondria import calcium through MCU-1 (and interacting factors). As a result, oxidative phosphorylation successfully occurs and mitochondrial ROS is a signaling by-product that signals glutamate receptors not to undergo exocytosis. But there are other interpretations of what might happen in between. In fact, if OXPHOS is disrupted, it is known that this can generate a lot more mitochondrial ROS than the normal by-product levels.

      We do agree that an alternative explanation could be that genetic or pharmacological inhibition of mitochondrial Ca2+ uptake disrupts oxidative phosphorylation, and as a result, inefficiencies or uncoupling in the electron transport chain would lead to an even greater increase in mitochondrial ROS production. Although oxidative phosphorylation was not directly measured, one of our post hoc analyses of GLR-1 transport suggests ATP levels are comparable between controls, mcu-1 mutants, and with Ru360 treatment: the velocity of GLR-1 transport is unchanged between these experimental groups. The processivity of molecular motors (which dictates transport velocity) is highly sensitive to relative ATP abundance. Thus, if ATP levels were dramatically decreased in mcu-1 mutants or following Ru360 treatment, then one would expect a detectable change in GLR-1 transport velocities, but we observed no change (see revised Figure S2E and related discussion at lines 183-190). Although these results do not directly indicate whether ATP production is altered with loss or inhibition of MCU-1, it does suggest that basal ATP levels remain sufficient to support the metabolic demands of GLR-1 transport.

      This reviewer wonders if excess ROS would cause an extreme response. Or alternatively, if scavenging ROS via pharmacological scavengers or SOD expression would reverse the effects.

      These are good points, and we have previously published experiments that address each of them. First, we have seen that globally increasing ROS with various concentrations of H2O2 within the physiological range (<100 nM) decreased GLR-1 transport to a similar extent (PMID: 32847966) indicating that there is not a dose-dependent decrease in GLR-1 transport. We have also assessed GLR-1 transport after treatment with concentrations of H2O2 well above the physiological range (e.g., 500 nM), but these high concentrations obliterated all GLR-1 transport. Contrary to what one may expect, we showed that decreasing ROS via pharmacological or genetic means (probably below physiological range) decreased GLR-1 transport (PMID: 35622512) via a Ca2+ independent mechanism. In other words, ROS scavenging did not have the opposite effect on GLR-1 transport, but we have not combined ROS scavenging with optical induction of ROS production (e.g., via KillerRed) nor have we assessed the potential influence of ROS scavenging on synaptic recruitment. Although we agree that these are important follow-up experiments, they will require a more sensitive ROS indicator because current genetically encoded in vivo ROS sensors cannot detect decreases in ROS levels below the physiological range (< 10 nM) (PMID: 31586057).

      Small Points

      • 33.3 mHz - just making sure, do the authors mean once every 30 seconds? That would be more straightforward.

      Yes, we do mean a 1-second pulse of light every 30 seconds. We have clarified this in the manuscript text (line 115).

      • Figure 2 is confusing. The text says that the mcu-1 mutants have a GLR-1::GFP FRAP rate that is comparable to controls (Lines 165-167). But Figure 2E suggests that it is markedly less, which is the opposite result of the slight increase in rate resulting from Ru360 treatment. And is the explanation why the GLR-1::GFP results differ from the SEP::GLR-1 results a difference between total GFP vs. surface GFP?

      The confusion is due to an incorrect statement in the results text. We have corrected this error and appreciate the reviewer for bringing it to our attention (lines 173-174).

      • I could not watch Video 2 (not sure if it is the file or just the copy I downloaded).

      We thank the reviewer for bringing this to our attention and we believe we have remedied the issue.

      • It is good that the authors tried both optical stimulation and mechanical stimulation (dropping culture plates to stimulate the worms, Figure 3). Why was the mechanical stimulation set aside for further tests in the paper?

      Mechanical stimulation consisted of dropping culture plates containing 2-3 C. elegans onto a lab bench every 30 seconds for 5 or 10 minutes. This mechanical stimulation paradigm was technically cumbersome and was less effective at inducing changes in mito-roGFP fluorescence that optical stimulation. This is likely due to habituation to the mechanical stimulus which has been well-characterized in C. elegans. The optical stimulation was therefore used as it is a more reliable and repeatable method for stimulating the AVA neuron.

      • Does this process affect all kinds of transport, or is it just the glutamate receptors? Was anything else examined?

      Transport of other proteins has not been examined in the context of mitoROS signaling. Our attempts at visualizing and quantifying the transport, synaptic delivery and exocytosis of other synaptic proteins in vivo has proven to be more technically challenging likely due to relatively lower expression in the C. elegans neurons suitable for transport analysis.

      Reviewer #3 (Public Review):

      Reactive oxygen species (ROS) have been previously shown to regulate glutamate receptor phosphorylation, long-distance transport, and delivery of glutamate receptors to synapses, however, the source of ROS is unclear. In this study, the authors test if mitochondria act as a signaling hub and produce ROS in response to neuronal activity in order to regulate glutamate receptor trafficking. The authors use a variety of optogenetic tools including the calcium reporter mitoGCaMP and the ROS reporter mito-roGFP to monitor changes in calcium and ROS, respectively, in mitochondria after activating neurons with ChRimson in the genetic model organism C. elegans. Repeated stimulation of interneurons called AVA with ChRimson leads to increased calcium uptake into mitochondria in dendrites and increased mitochondrial ROS production. The mitochondrial calcium uniporter mcu-1 is required for these effects because mcu-1 genetic loss of function or treatment with Ru360, a drug that inhibits mcu-1, inhibits the uptake of calcium into mitochondria and ROS production after neuronal activation. Mcu-1 genetic loss of function is correlated with an increase in exocytosis of glutamate receptors but a decrease in glutamate receptor transport and delivery to dendrites. This study suggests that mitochondria monitor neuronal activity by taking up calcium and downregulating glutamate receptor trafficking via ROS, as a means to negatively regulate excitatory synapse function.

      Strengths

      -The use of multiple optogenetic tools and approaches to monitor mitochondrial calcium, reactive oxygen species, and glutamate receptor trafficking in live organisms.

      -Identifying a novel signaling role for dendritic mitochondria which is to monitor neuronal activity (via calcium uptake into mitochondria) and generate a signal (reactive oxygen species) that regulates glutamate receptors at synapses.

      Weaknesses

      -Although the use of KillerRed to generate ROS downstream of mcu-1 is a clever approach, the fact that activation of KillerRed results in reduced GLR-1 exocytosis, delivery, and transport raises the concern that KillerRed is generating a high level or ROS that might be toxic to cellular processes. Experiments showing that other cellular processes are not affected by KillerRed activation and testing if reduced ROS production mimics the effects of blocking mcu-1 would strengthen the conclusions in this study.

      We thank the reviewer for their careful analyses of our findings. It is plausible that KillerRed could cause toxic levels of ROS, in fact, it was originally used to instigate oxidative stress-induced apoptosis to achieve cell-specific ablation. These cell ablation protocols required 20+ minutes of KillerRed activation with substantially higher levels of irradiation (e.g., 3.8 mW/mm [PMID: 24209746] vs. our light dosage of 25 µW/mm2). Additionally, our transgenic C. elegans strains expressing KillerRed were designed to have a relatively low KillerRed expression and were screened for low expression based on KillerRed’s fluorescence. Using these strains, we were able to minimally activate KillerRed in the AVA neuron resulting in ROS elevations at mitochondria that were comparable to neuronal activity-induced increases in mitochondrial ROS as measured by mito-roGFP. Specifically, we found that 10 minutes of mechano-stimulation and 5 minutes of ChRimson stimulation increased the fluorescence ratio (Fratio) of mito-roGFP nearly two-fold (Figure 4A-B and 4C-E). A 15-second pulse of light focused on a small region activating mitoKR in the AVA neurite also caused similar two-fold increase in the mito-roGFP Fratio (Figure 4C-E) comparable to what neuronal activity induced. Our 5-minute global KillerRed activation less effectively increased the mito-roGFP Fratio at mitochondria in the AVA neurite compared to neuronal activity (revised Figure 4B and 4H) but was sufficient in decreasing GLR-1 transport (revised Figure 5G-H). So, we decided to do all experiments with 5 minutes of global KillerRed activation since lower activation levels of KillerRed were more likely to achieve non-toxic, signaling levels of ROS. Since we strongly agree that this data is important for tool validation, we have reorganized the manuscript such that these data are now a primary figure (see revised Figure 4 and new results sub-section starting at line 252).

      Additionally, we added supplemental transport velocity data. This data shows that local photoactivation as well as whole-cell activation of KillerRed does not alter transport velocity of GLR-1 vesicles within the neurite (revised Figure S4A and S4B and lines 272-276 and 287-289), which would be the case if ATP, microtubules, or actin dynamics were affected. This supports that our local and whole-cell activation protocol does not cause toxic levels of ROS production.

      Lastly, the reviewer questions whether decreasing ROS alters GLR-1 transport, synaptic delivery and exocytosis in a similar fashion to loss or inhibition of mcu-1, and if so, would further support the proposed mechanism. We have decreased ROS via genetic (catalase overexpression) and pharmacological (using the mitochondria-targeted antioxidant MitoTEMPO) means and seen that diminished ROS levels decrease GLR-1 transport albeit to a lesser degree than that caused by loss/inhibition of mcu-1 (PMID: 35622512). To determine if decreased GLR-1 transport during diminished ROS levels involves mcu-1, we would need to assess GLR-1 transport in mcu-1 mutants while ROS is decreased (e.g., using MitoTEMPO treatment) to see if their combined effect phenocopies the effect of mcu-1(lf) or decreased ROS alone. However, as mentioned previously, we are unable to measure ROS levels below the sensitivity of roGFP but within physiological range so we cannot currently calibrate or validate our methods for scavenging ROS in vivo. This is why we have not yet analyzed synaptic delivery or exocytosis rates of GLR-1 in the context of decreased ROS, but these would be interesting follow-up experiments that may further support our model once more sensitive ROS sensors are available.

      Reviewer #4 (Public Review):

      Using optogenetic stimulation, the authors presented compelling evidence that neuronal activity increases mitochondrial calcium levels, facilitated by the mitochondrial uniporter MCU-1. Through ratiometric measurements, they showed that mitochondrial ROS levels also increase due to neuronal activity via MCU-1. Subsequent FRAP studies were employed to investigate the trafficking of the AMPA receptor, GLR-1. By integrating genetic and pharmacological methodologies, the recovery rate of GLR-1 was assessed. The authors concluded that increased mitochondrial ROS due to neuronal activity reduces the trafficking and exocytosis of AMPA receptors. They proposed that mitochondrial ROS serves as a homeostatic mechanism regulating AMPA receptor trafficking and abundance, thus maintaining synaptic strength. This research is crucial as it provides a direct link between mitochondrial signaling and AMPA receptor trafficking.

      However, there are several significant concerns regarding the methodologies and quantifications employed in this manuscript. The authors utilized GLR-SEP to label surface AMPA receptors and relied on the "FRAP rate" as an indicator of the exocytosis rate. The absence of direct visualization of exocytosis using GLR-SEP, and the lack of direct measurements of exocytosis events, casts doubt on the conclusions about ROS's impact on AMPA receptor exocytosis. Furthermore, the "FRAP rate" determined in this study is a combination of recovery rates (incorporating both endosomal trafficking and diffusion) with the mobile fractions of AMPA receptors, potentially weakened interpretations of the findings. A more comprehensive discussion addressing the conflicting effects of MCU-1 and ROS on GLR-GFP FRAP recovery and dendritic trafficking would enable readers to grasp the intricate roles of mitochondrial calcium and ROS in modulating synaptic receptors.

      We appreciate the reviewer’s attention to detail while reviewing our article. Their major concern about directly visualizing exocytosis events is valid since changes in exocytosis and endocytosis would dictate the amount of SEP::GLR-1 at the synaptic membrane. However, streaming imaging of SEP in vivo is technically difficult showing only few exocytosis events and provides short “snapshots” (1-2 minutes, longer streaming imaging causes photobleaching and photo-toxicity) which must be extrapolated to longer time frames. Our 16-minute SEP::GLR-1 FRAP protocol allows us to capture all plasma membrane recruitment and quantify the relative balance between exo- and endocytosis. It also allows for longer observational periods during which we can detect changes in GLR-1 recruitment to and retention at the synaptic membrane in genetic mutants and with drug treatments. In addition, our photobleaching approach involves photobleaching a ~40-60 µm region proximally and distally to the imaging region which limits the influence of receptor diffusion on the FRAP rate. The reviewer makes a valid point that receptor endocytosis rates would also influence the SEP::GLR-1 FRAP rate. We have now changed the text in the results and discussion to include this information (lines 155-161, and changing “exocytosis” to “synaptic recruitment” throughout the manuscript when discussing SEP::GLR-1 FRAP results [e.g, at lines 169, 208, and 321]).

    1. Author Response

      The following is the authors’ response to the original reviews.

      necessary clarifications on some of the reviewers' suggestions.

      Reviewer #1 (Public Review):

      Weaknesses:

      • This is a pilot study with only 24 cases and 24 controls. Because the human microbiota entails individual variability, this work should be confirmed with a higher sample size to achieve enough statistical power.

      Thank you for your suggestion. Unlike the high sparsity of 16s rRNA, the data density of metagenomic data is higher. Based on the experience of previous research, the sample size used this time can basically meet the requirements. However, your suggestion is very valuable, increasing the sample size allows better in-depth analysis. Due to limitations of objective factors, it is difficult for us to continue to increase the sample size in this study.

      • The authors do not report here the use of blank controls. The use of this type of control is important to "subtract" the potential background from plasticware, buffer or reagents from the real signal. Lack of controls may lead to microbiome artefacts in the results. This can be seen in the results presented where the authors report some bacterial contaminants (Agrobacterium tumefaciensis, Aequorivita lutea, Chitinophagaceae, Marinobacter vinifirmus, etc) as part of the most common bacteria found in cervical samples.

      Thank you for your suggestion. Applying blank controls in low biomass areas can effectively avoid contamination caused by the environment or kits. This opinion is consistent with that published by Raphael Eisenhofer et al. in Trends in Microbiology. When designing this study, we considered that this study described a biomass-rich site, and the abundance of dominant species was much higher than that of the possible 'kitome', so we did not set a blank control. On the other hand, our main discussion object in this study is high-abundance species, and the species filtering threshold for some analyzes was raised to 50%. Therefore, we believe that the absence of the blank control has little effect on the conclusions of this study. However, your opinion is spot on. Failure to set up a negative control will affect our future research on rare species. We will add a description in the Limitations section of the Discussion section.

      • Samples used for this study were collected from the cervix. Why not collect samples from the uterine cavity and isthmocele fluid (for cases)? In their previous paper using samples from the same research protocol ((IRB no. 2019ZSLYEC-005S) they used endometrial tissue from the patients, so access to the uterine cavity was guaranteed.

      Thank you for your suggestion. In Author response image 1 we show the approximate location of our cervical swab sampling. There are two main reasons for choosing cervical swabs:

      1) The adsorption of swabs allows us to obtain sufficient nucleic acid for high-depth sequencing, while the isthmocele fluid varies greatly among patients, which will introduce unnecessary batch effects.

      2) Since the female reproductive tract is a continuous whole, our sampling location is close to the lesion in the cervix, which can be effectively studied. On the other hand, the microbial biomass of the endometrium is probably two orders of magnitude lower than that of the cervix, and it is difficult to avoid contamination of the lower genital tract when sampling.

      Based on the above reasons, we selected cervical swabs for our microbial data.

      Author response image 1.

      • Through the use of shotgun genomics, results from all the genomes of the organisms present in the sample are obtained. However, the authors have only used the metagenomic data to infer the taxonomical annotation of fungi and bacteria.

      Thank you for your suggestion. The advantage of metagenomics is that it can obtain all the nucleic acid information of the entire environment. However, in the study of the female reproductive tract, the database of viruses and archaea is still immature, in order to ensure the accuracy of the results, we did not conduct the study. Looking forward to the emergence of a mature database in the future.

      Reviewer #1 (Recommendations For The Authors):

      • It would be interesting to use another series of functional data coming from the metagenomic analyses (not only taxonomic) to expand and reinforce the results presented.

      Thank you for your suggestion. We have dissected the functional data of microbiota in the article.

      • The authors have previously published the 16S rRNA sequencing and transcriptomic analysis of the same set of patients. It would be nice to see the integration of all the datasets produced.

      Thank you for your suggestion. There is no doubt that integrating all the data will have more dimensional results. In our previous study we focused on microbe-host interactions. However, there is an unanswered question: What are the characteristics of the regulatory network within microbiota? Therefore, we answered this question in this study, exploring the complex interaction processes within microbial communities. In addition to direct effects, interactions between microbiota may also occur through special metabolite experiments. Therefore, we introduced the analysis of the untargeted metabolome. However, 16s rRNA can only provide bacterial information, so we did not integrate the data. In addition, the transcriptome provides host information and is not the focus of this study. However, your suggestion is very valuable, and we will integrate all the data in the next study on the exploration of treatment methods.

      Reviewer #2 (Public Review):

      Weaknesses: Methodological descriptions are minimal.

      Some example:

      *The CON group (line 147) has not been defined. I supposed it is the control group.

      • There are no statistics related to shotgun sequencing. How many reads have been sequenced? How many have been removed from the host? How many are left to study bacteria and fungi? Are these reads proportional among the 48 samples? If not, what method has been used to normalise the data?

      • ggClusterNet has numerous algorithms to better display the modules of the microbiome network. Which one has been used?

      Thank you for your suggestion. We have added details to the method.

      Reviewer #2 (Recommendations For The Authors):

      I think the author should take into account the points described in the "Weaknesses" section. The lack of detail extends to almost all the analyses that have been included in the manuscript. Although the results are sound, I think it is important to understand what has been analysed and how it has been analysed. It is important that all work is reproducible and this requires vital information.

      For example, what parameters have been used for bowtie2? has a local analysis been used? or end-to-end ? Some parameters like --very-sensitive are important for this kind of analysis. You can also use specific programs like kneaddata.

      The Raw data preprocessing section should be more detailed.

      The same with the "Taxa and functional annotation" section, how have the data been normalised? has any Zero-Inflated Gamma probabilistic model algorithm been taken into account? How were the 0 (no species detected) in the shallow samples treated?

      Which algorithms have been used for LEfSe ? Kluskal-Wallis->(Wilcoxon)->LDA ?

      Which p-value has been used as cut-off ? this p-value has been corrected for multiple testing?

      • Information on ggClusterNet should be included and explained.

      The first section of the results and Table 1 should be in the Materials and Methods.

      Thank you for your suggestion. We have added details to the method.

      In the fungi section, it is mentioned that 431 species have been found. They should be included in a supplementary table.

      How many bacteria were found? Please include them also in a supplementary table.

      Thank you for your suggestion. We have added the corresponding table.

      Reviewer #3 (Public Review):

      Major

      1. Smoke or drink conditions, as well as diseases like hypertension and diabetes are important factors that could influence the metabolism of the host, thus the authors should add them in the exclusion criteria in the Methods.

      Thanks to reviewer #3 for professional comments. We have made corresponding additions in the method section. We also followed this standard when recruiting subjects.

      1. The sample size of this study is not large enough to draw a convincing conclusion.

      Thank you for your suggestion. Unlike the high sparsity of 16s rRNA, the data density of metagenomic data is higher. Based on the experience of previous research, the sample size used this time can basically meet the requirements. However, your suggestion is very valuable, increasing the sample size allows better in-depth analysis. Due to limitations of objective factors, it is difficult for us to continue to increase the sample size in this study.

      Reviewer #3 (Recommendations For The Authors):

      Please recruit more samples.

      In addition, there are many formatting and grammatical mistakes in the manuscript.

      Minor

      1. In Line 24-25 of the "Composition and characteristics of fungal communities", the format of "Goyaglycoside A and Janthitrem E." shouldn't be italic.

      2. In Line 126 of the "Metabolite detection using liquid chromatography (LC) and mass spectrometry (MS)", the "10 ul" should be changed to "Ten ul". Beginning with arabic numerals in a sentence should be avoided.

      3. In Line 170 of the "Composition and characteristics of bacterial communities", the "162 differential species" should be "One hundred and sixty-two differential species".

      4. In Line 187 of the "Composition and characteristics of fungal communities", the "42 differential" should be "Forty-two differential".

      Thanks to reviewer #3 for professional comments. We have completely revised the language of the article.

    1. Author Response

      Reviewer #1 (Public Review):

      Payne et al. have investigated the neural basis of VOR adaptation with the goal of constraining sites and mechanisms of plasticity supporting cerebellar learning. This has been an area of intense debate for decades; previous competing models have argued extensively about the sites of plasticity and the strength of eye velocity feedback/ efference copy signals to Purkinje cells has been central to the debate. This paper nicely explores the consequences of varying the strength of this feedback and in so doing, provides a potential explanation for why Purkinje cell responses during VOR cancellation could exhibit stronger responses following learning, despite net depression of the strength of their vestibular inputs. In that sense it provides some reconciliation of existing models. The work appears to be well done and the paper is well written. The manuscript could be improved and the significance of the work clarified and enhanced by contextualizing the work more appropriately within the existing literature in this area.

      We thank the reviewer for the nice summary of this work’s contribution to the long-standing debate regarding sites and mechanisms of plasticity underlying cerebellar learning.

      We have revised the manuscript to address several key points raised by the reviewer. We now emphasize that the main evidence for weak feedback arises from interpreting our model in the context of the existing experimental evidence for plasticity rules in the cerebellar cortex, and we have clarified the commonalities and differences from the Miles-Lisberger model. Several missing references are now included. Additionally, we clarify the comparison of our model to data after learning, and explain how altered signaling through the visual pathways drives paradoxical changes in neural activity without requiring plasticity in the visual pathways. We hope that these changes better situate the work to be interpreted appropriately in the context of the existing literature.

      Reviewer #2 (Public Review):

      Payne et al. use a computational approach to predict the sites and directions of plasticity within the vestibular cerebellum that explain an unresolved controversy regarding the basis of VOR learning. Specifically, the conclusion by Miles and Lisberger (1981) that vestibular inputs onto Purkinje cells (PCs) must potentiate, rather than depress (as in the Marr/Albus/Ito model), following gain-increase learning because when the VOR is cancelled, PC firing increases rather than decreases. Payne et al. provide a novel model solution that recapitulates the results of Miles and Lisberger but, paradoxically, uses plasticity in the cerebellar cortex that weakens PC output rather than strengthens it. However, the model only succeeds when efference copy feedback to the cerebellar cortex is relatively weak thereby allowing a second feedback pathway to drive PC activity during VOR cancellation to counteract the learned change in gain. Because the model is biologically constrained, the findings are well supported. This work will likely benefit the field by providing a number of potentially experimentally testable conclusions. The findings will be of interest to a wider audience if the results can be extrapolated to other cerebellar-dependent learning behaviors rather then just VOR gain-increase learning. Overall, the manuscript is very well written with clearly delineated results and conclusions.

      We appreciate the reviewer’s comments that the model is well-constrained and provides a solution to the long-standing debate surrounding sites and directions of plasticity underlying VOR learning.

      The reviewer raises an important question: do our results generalize across the cerebellum? We note first that we are studying the cerebellum to illustrate a core problem in modeling systems throughout the brain, namely, how to disambiguate plasticity in the face of ubiquitous feedback loops, both within the brain and between the brain and the environment. Within the cerebellum, we focused on VOR learning due to the wealth of experimental data available. While the specific effect of feedback strength on plasticity will depend on the details of the relevant cerebellar circuit, our general approach can be applied to other areas, given sufficient data, in order to determine how plasticity is distributed in the face of potential feedback loops. Importantly, error-driven LTD of the parallel fiber-Purkinje cell synapse is a fundamental hypothesized mechanism for cerebellar learning which has been generally accepted elsewhere in the cerebellum, but was called into question for VOR learning in the flocculus by the Miles-Lisberger model. Thus, our study of VOR learning has broad implications for reconciling plasticity mechanisms across the cerebellum.

      We also note that, even within the VOR circuit, the direction of plasticity and the relative dependence on plasticity at each site may depend on the timescale of learning. On longer timescales, there is thought to be consolidation of learning from a cerebellar cortical site to a brainstem site. Such consolidation from a faster-learning site to a slower-learning site is known as systems consolidation and has been shown theoretically to mitigate the ‘plasticity-stability dilemma’ of having fast learning without over-writing longer-term learning. Our model is compatible with both error-driven plasticity in the cerebellar cortex and a site of plasticity in the brainstem, with brainstem plasticity potentially mediating consolidation of earlier learned changes in the cerebellar cortex. We have now updated the text significantly to discuss the broader implications of the results and to address the reviewer’s specific comments.

      Reviewer #3 (Public Review):

      Summary: In this study, the authors attempt to determine what is the role (and strength) of feedback in a closed-loop (cerebellar) system.

      Strengths:

      1) By combining extensive data fitting of cerebellar experimental observations this study provides deep insights into existing questions and more broadly on the role of feedback and what are the limitations when inferring feedback in (plastic) neural circuits.

      2) Another strength of this study is the gradual build-up of evidence by using models of different complexities to help build the argument that weak feedback is sufficient to explain experimental observations.

      3) The paper is well-written and structured.

      Weaknesses:

      1) In principle feedback can (i) drive dynamics or/and (ii) drive learning directly. Throughout the paper, the authors refer to only the first case (i.e. dynamics). However, the role of feedback in learning is already implicitly assumed by the authors when jointly fitting the model before and after learning. Note that the general conclusion that feedback (in general) is weak may be to the first view (i.e. dynamics), but not the second. Given that a key conclusion of the paper is that no feedback is sufficient to explain the data, this suggests that feedback may instead be used for learning/plasticity.

      We fully agree with the reviewer that our conclusions do not preclude an important role for many other types of feedback, including as an instructive signal for learning. Instead of explicitly considering feedback for learning in our model, we consider static snapshots before and after learning to infer plasticity, while remaining agnostic to the neural algorithm used to achieve such plasticity. A widely held hypothesis is that motor error signals carried by climbing fibers instruct LTD at co-active parallel fiber inputs to Purkinje cells; this is indeed a form of feedback, operating on a slower timescale than “feedback for dynamics.” This “feedback for learning” is not modeled here but is fully consistent with our results, as discussed in a new paragraph of our Discussion (end of Section 3.4.1 “Pathways undergoing plasticity”).

      2) There are some potential limitations of the conclusions drawn due to the model inference methods used. The methods used (fmincon) can easily get stuck in local minima and more importantly they do not provide an overview of the likelihood of parameters given the data. A few studies have now shown that it is important to apply more powerful inference techniques both to infer plasticity (Bykowska et al. Frontiers 2019) and neural dynamics (Gonçalves et al. eLife 2020). As highlighted by Costa et al. Frontiers 2013 using more standard fitting methods can lead to misleading interpretations. Given the large range of experimental data used to constrain the model, this may not be an issue, but it is not explicitly shown.

      The reviewer correctly points out that we used a deterministic model-fitting procedure. To address this concern, we complemented the full dynamic model with a simple analytic model ( Figure 5 ) for which we could fully derive the cost function landscape and analytically show that there is a line of parameters corresponding to a perfect degeneracy in the model. Thus, the challenge in the model we analyze is that there are too many solutions, rather than it being difficult to find a solution. Given this degeneracy, we chose to fix the level of efference copy feedback and then find the (now non-degenerate) solutions, and to then compare these different solutions with regards to their implications for the correlated strengths and changes in strengths of different pathways. We have edited the relevant section of the Discussion for clarity on this topic, and have added references to the additional strategies for model inference mentioned above, in Section 3.3 “Relation to other sloppy models”.

      3) There is some lack of clarity on how the feedback pathways as currently presented should be interpreted in the brain.

      We interpret this comment as referring to the questions of (1) whether our model includes a pathway for learning through feedback, (2) what is the anatomical implementation of the efference copy feedback pathway and visual pathways, and (3) how should the positive weights on the efference copy feedback pathway k PE be interpreted. We address these below.

      (1) Feedback for learning was discussed in point 1 above.

      (2) Anatomical implementation of efference copy pathway: We have edited the Discussion to clarify that there is anatomical evidence for efference copy input to the cerebellum, but that a key aspect of ‘feedback’ is that activity functionally loops back onto itself. Instead, neurons carrying eye movement commands (such as in the vestibular nucleus) could send signals to the cerebellum, without receiving output from the same cerebellar neurons – this would correspond to a ‘spiraling’ pathway that does not form a closed feedback loop (Figure 8). Thus we argue that the existence of the gross anatomical pathways does not necessitate a role for strong, functional, efference copy feedback (Discussion, Section 3.1, lines 481-491).

      Anatomical implementation of visual pathway: The visual feedback pathways considered here are those that would receive visual motion information from the environment. This visual feedback is itself changed by eye movements, thus providing a net overall negative feedback loop that helps to stabilize gaze. This pathway has been proposed to involve cortical regions such as MST (discussed in Materials and Methods, Model Implementation, lines 769-774).

      (3) Interpretation of positive feedback loop: In our model, the efference copy feedback filter, k PE , has positive weight. This corresponds to the positive net sign of the Purkinje cell to brainstem to Purkinje cell feedback loop. Specifically, the Purkinje cell to brainstem pathway is inhibitory (because Purkinje cells are inhibitory), the brainstem to eye velocity command pathway is inhibitory (to achieve counter-rotation of the eyes in response to head turns), and the feedback of this eye velocity command back to Purkinje cells (k PE ) is positive. Thus this loop in our model represents positive feedback. This is now clarified in Materials and Methods, Model Implementation, lines 748.

      4) The functional benefits of having (or not) feedback could be better discussed (related to point 1 above).

      Related to point 1 above, it is certainly the case that feedback is necessary for learning. We do not explicitly model the climbing fiber feedback thought to be involved in learning/plasticity of the parallel fiber pathway.

      We instead focus on the role of efference copy feedback, and how it functionally impacts the required sites and signs of plasticity in the circuit. As shown in the paper, if the efference copy pathway is strong, then this is most consistent with learned changes in eye movements being driven primarily by plasticity in the brainstem pathway (as in the Miles-Lisberger hypothesis), whereas if the efference copy pathway is weak, then this is most consistent with learned changes in eye movements being driven by net depression in the parallel fiber to Purkinje cell pathway (as in the classic Marr-Albus-Ito model and as suggested by most cellular and molecular studies of parallel fiber-Purkinje cell plasticity), in addition to a role of plasticity in the brainstem pathway. We also note that, in the ‘Strong Feedback’ model, the feedback is so strong that the system is on the brink of instability – this has been argued to have the functional benefit of providing ‘inertia’ to eye movements that could help to maintain eye movements during smooth pursuit when a target goes behind an occluder, but it also has the disadvantage of placing the system at a level of positive feedback near the brink of instability. We also note that the visual feedback pathway through the environment, emphasized in this work, serves as a negative feedback loop that reduces deviations between the eye and target velocity. We have extensively re-written the first section of the Discussion (Section 3.1), in order to more clearly lay out the implications of each model for circuit plasticity and feedback.

      5) Some of the key conclusions of the work are not described in the abstract, namely that feedback is weak in the cerebellar system.

      Thank you for raising this point, we have added this key conclusion to the end of the abstract: “Our results address a long-standing debate regarding cerebellum-dependent motor learning, suggesting a reconciliation in which error-driven plasticity of synaptic inputs to Purkinje cells is compatible with seemingly oppositely directed changes in Purkinje cell activity. More broadly, the results demonstrate how learning-related changes in neural activity can appear to contradict the sign of the underlying plasticity when either internal feedback or feedback through the environment is present.”

      Claims:

      The argument is well-built throughout the paper, but there are some potential caveats with the general interpretation (see weaknesses).

      Impact:

      This work has the potential to bring important messages on how best to interpret and infer the role of feedback in neural systems. For the field of the cerebellum, it also proposes solutions to long-standing problems.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Cyclic Nucleotide Binding (CNB) domains are pervasive structural components involved in signaling pathways across eukaryotes and prokaryotes. Despite their similar structures, CNB domains exhibit distinct ligand-sensing capabilities. The manuscript offers a thorough and convincing investigation that clarifies numerous puzzling aspects of nucleotide binding in Trypanosoma.

      Strengths:

      One of the strengths of this study is its multifaceted methodology, which includes a range of techniques including crystallography, ITC (Isothermal Titration Calorimetry), fluorimetry, CD (Circular Dichroism) spectroscopy, mass spectrometry, and computational analysis. This interdisciplinary approach not only enhances the depth of the investigation but also offers a robust cross-validation of the results.

      Weaknesses:

      None noticed.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript clearly shows that Trypanosoma PKA is controlled by nucleoside analogues rather than cyclic nucleotides, which are the primary allosteric effectors of human PKA and PKG. The authors demonstrate that the inosine, guanosine, and adenosine nucleosides bind with high affinity and activate PKA in the tropical pathogens T. brucei, T. cruzi and Leishmania. The underlying determinants of nucleoside binding and selectivity are dissected by solving the crystal structure of T. cruzi PKAR(200-503) and T. brucei PKAR(199-499) bound to inosine at 1.4 Å and 2.1 Å resolution and through comparative mutational analyses. Of particular interest is the identification of a minimal subset of 2-3 residues that controls nucleoside vs. cyclic nucleotide specificity.

      Strengths:

      The significance of this study lies not only in the structure-activity relationships revealed for important targets in several parasite pathogens but also in the understanding of CNB's evolutionary role.

      Weaknesses:

      The main missing piece is the model for activation of the kinetoplastid PKA which remains speculative in the absence of a structure for the trypanosomatid PKA holoenzyme complex. However, this appears to be beyond the scope of this manuscript, which is already quite dense.

      We fully agree that insight into the activation mechanism and its possible deviation from the mammalian paradigm requires a holoenzyme structure revealing the details of R-C interaction. We have attempted Cryo-EM from LEXSY-produced holoenzyme, yet upscaling the purification procedures described in this manuscript have repeatedly failed in spite of numerous protocol changes and optimizations. Much more work is required to achieve this.

      Reviewer #2 (Recommendations For The Authors):

      Some minor points to consider for enhancing the impact of this interesting manuscript:

      1) The nucleoside affinities measured are mainly for the regulatory subunits unbound to the kinase domain. How would nucleoside affinities change when the regulatory subunits are bound to the kinase domain, which is presumably the case under resting conditions? An estimation of this change in affinity is important because it more closely relates to the variations in cellular nucleoside concentrations needed for activation.

      This is an important question and we have given an indirect answer in the manuscript, but not very explicit. The EC50 values for kinase activation of the purified holoenzyme complexes are very similar or almost identical to the kD values measured by ITC with free regulatory subunits. By inference, the binding kD for the holoenzyme and for the free R-subunit cannot be very different. In addition, we have recently determined the EC50 for PKA activation in vivo in trypanosomes using a bioluminescence complementation reporter assay. The values fit perfectly to the values obtained with purified holoenzyme (Wu et al. in preparation). A sentence in Results (lines 201-203) has been added.

      2) The authors should point out that a major implication of nucleoside vs. cyclic nucleotide activation is in terms of signal termination. If phosphodiesterases (PDEs) are responsible for cAMP/cGMP signal termination, what terminates nucleoside-dependent signaling? Although the answer to this question may not be known at this stage, it is important to highlight this critical implication of the authors' study.

      The mechanism of signal termination is indeed unknown so far. We speculate that some enzymes of the purine salvage pathways are differentially localized in subcellular compartments and thereby able to establish microdomains that enable nucleoside signaling. In addition, PKA subunit phosphorylations/dephosphorylations and/or protein turnover may also regulate signal termination. As an example, free PKAC1 is rapidly degraded upon depletion of the PKAR subunit by RNAi. We have now mentioned signal termination in Discussion and have revised the last part of Discussion (lines 567-602). A possible approach to monitor compartmentalized signaling would be using the FluoSTEPs technology (Tenner et al., Sci. Adv. 2021; 7: eabe4091), but adapting this to the trypanosome system will not be a short-term task.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Assessment note: “Whereas the results and interpretations are generally solid, the mechanistic aspect of the work and conclusions put forth rely heavily on in vitro studies performed in cultured L6 myocytes, which are highly glycolytic and generally not viewed as a good model for studying muscle metabolism and insulin action.”

      While we acknowledge that in vitro models may not fully recapitulate the complexity of in vivo systems, we believe L6 myotubes are appropriate for studying the mechanisms underlying muscle metabolism and insulin action. L6 myotubes possess many important characteristics relevant to our research, including high insulin sensitivity and a similar mitochondrial respiration sensitivity compared to primary muscle fibres. Furthermore, several studies have demonstrated the utility of L6 myotubes as a model for studying insulin sensitivity and metabolism, including our own previous work (PMID: 19805130, 31693893, 19915010) and work of others (PMID:12086937, 29486284, 15193147).

      Importantly, our observations from the L6 myotube model are supported by in vivo data from both mice and humans. Chow (Figure 3J, K) and high-fat fed mice (new data - Supplementary Figure 4 H-I) demonstrated a reduction in mitochondrial Ceramide and an increase in CoQ9. Muscle biopsies from humans showed a strong negative correlation between mitochondrial C18:0 ceramide levels and insulin sensitivity (PMID: 29415895). Further, complex I and IV abundance was strongly correlated with both muscle insulin sensitivity and mitochondrial ceramide (CerC18:0) (Figure 6E, F). This is consistent with our observations in L6 myotubes (Figure 6H, I). These findings support the relevance of our in vitro results to in vivo muscle metabolism.

      Points from reviewer 1

      1. Although the authors' results suggest that higher mitochondrial ceramide levels suppress cellular insulin sensitivity, they rely solely on a partial inhibition (i.e., 30%) of insulin-stimulated GLUT4-HA translocation in L6 myocytes. It would be critical to examine how much the increased mitochondrial ceramide would inhibit insulin-induced glucose uptake in myocytes using radiolabeled deoxy-glucose. Another important question to be addressed is whether glycogen synthesis is affected in myocytes under these experimental conditions. Results demonstrating reductions in insulin-stimulated glucose transport and glycogen synthesis in myocytes with dysfunctional mitochondria due to ceramide accumulation would further support the authors' claim.

      Response: We have now conducted additional experiments focusing on glycogen synthesis as a readout of insulin sensitivity, as it offers an orthogonal method for assessing GLUT4 translocation and glucose uptake. L6-myotubes overexpressing the mitochondrial-targeted ASAH1 construct (as described in Fig. 3) were challenged with palmitate and insulin stimulated glycogen synthesis was measured using 14C radiolabeled glucose. As shown below, palmitate suppressed insulin-induced glycogen synthesis, which was effectively prevented by overexpression of ASAH1 (N = 5, * p<0.05) supporting our previous observation using GLUT4 translocation as a readout of insulin sensitivity (Fig. 3). These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism.

      These data have now been added to Supplementary Figure 4K and the results modified as follows:

      “...For this reason, several in vitro models have been employed involving incubation of insulin sensitive cell types with lipids such as palmitate to mimic lipotoxicity in vivo. In this study we have used cell surface GLUT4-HA abundance as the main readout of insulin response...”

      “Notably, mtASAH1 overexpression protected cells from palmitate-induced insulin resistance without affecting basal insulin sensitivity (Fig. 3E). Similar results were observed using insulin-induced glycogen synthesis as an orthologous technique for Glut4 translocation. These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism (Sup. Fig. 5K). Importantly, mtASAH1 overexpression did not rescue insulin sensitivity in cells depleted…”

      Author response image 1.

      Additionally, the following text was added to the method section:

      “L6 myotubes overexpressing ASAH were grown and differentiated in 12-well plates, as described in the Cell lines section, and stimulated for 16 h with palmitate-BSA or EtOH-BSA, as detailed in the Induction of insulin resistance section.

      On day seven of differentiation, myotubes were serum starved in DMEM for 3.5 h. After incubation for 1 h at 37 °C with 2 µCi/ml D-[U-14C]-glucose in the presence or absence of 100 nM insulin, glycogen synthesis assay was performed, as previously described (Zarini S. et al., J Lipid Res, 63(10): 100270, 2022).”

      1. In addition, it would be critical to assess whether the increased mitochondrial ceramide and consequent lowering of energy levels affect all exocytic pathways in L6 myoblasts or just the GLUT4 trafficking. Is the secretory pathway also disrupted under these conditions?

      Response: This is an interesting point raised by the reviewer that is aimed at the next phase of this work, to identify how ceramide induced mitochondrial dysfunction drives insulin resistance. Looking at energy deficiency in more detail as well as general trafficking is part of ongoing work, but given the complexity of this question, it is beyond the scope of the current study.

      Points from reviewer 2

      1. The mechanistic aspect of the work and conclusions put forth rely heavily on studies performed in cultured myocytes, which are highly glycolytic and generally viewed as a poor model for studying muscle metabolism and insulin action. Nonetheless, the findings provide a strong rationale for moving this line of investigation into mouse gain/loss of function models.

      Response: We acknowledge that in vitro models may not fully mimic in vivo complexity as described above in the response to the “Assessment note”. We have now added to the Discussion:

      “In this study, we mainly utilised L6-myotubes, which share many important characteristics with primary muscle fibres. Both types of cells exhibit high sensitivity to insulin and respond similarly to maximal doses of insulin, with GLUT4 translocation stimulated between 2 to 4 times over basal levels in response to 100 nM insulin (as shown in Fig. 1-4 and (46,47)). Additionally, mitochondrial respiration in L6-myotubes has a similar sensitivity to mitochondrial poisons, as observed in primary muscle fibres (as shown in Fig. 5 (48)). Finally, inhibiting ceramide production increases CoQ levels in both L6-myotubes and adult muscle tissue (as shown in Fig. 2-3). Therefore, L6-myotubes possess the necessary metabolic features to investigate the role of mitochondria in insulin resistance, and this relationship is likely applicable to primary muscle fibres”.

      1. One caveat of the approach taken is that exposure of cells to palmitate alone is not reflective of in vivo physiology. It would be interesting to know if similar effects on CoQ are observed when cells are exposed to a more physiological mixture of fatty acids that includes a high ratio of palmitate, but better mimics in vivo nutrition.

      Response: We appreciate the reviewer's comment. Previously, we reported that mitochondrial CoQ depletion occurs in skeletal muscle after 14 and 42 days of HFHSD feeding, coinciding with the onset of insulin resistance (PMID: 29402381, see figure below).

      Author response image 2.

      These data demonstrated that our in vitro model recapitulates the loss of CoQ in insulin resistance observed in muscle tissue in response to a more physiological mixture of fatty acids. Further, it has been reported that different fatty acids can induce insulin resistance via different mechanisms (PMID:20609972), which would complicate interpretation of the data. Saturated fatty acids such as palmitate increase ceramides in cell-lines and humans, but unsaturated FAs generally do not (PMID: 10446195,14592453,34704121). As such we conclude that palmitate is a cleaner model for studying the effects of ceramide on skeletal muscle function.

      We have added to discussion:

      “…These findings align with our earlier observations demonstrating that mice exposed to HFHSD exhibit mitochondrial CoQ depletion in skeletal muscle (Fazakerley et al. 2018).”

      1. While the utility of targeting SMPD5 to the mitochondria is appreciated, the results in Figure 5 suggest that this manoeuvre caused a rather severe form of mitochondrial dysfunction. This could be more representative of toxicity rather than pathophysiology. It would be helpful to know if these same effects are observed with other manipulations that lower CoQ to a similar degree. If not, the discrepancies should be discussed.

      Response: As the reviewer suggests many of these lipids can cause cell death (toxicity) if the dose is too high. We have previously found that low levels (0.15 mM) of palmitate were sufficient to trigger insulin resistance without any signs of toxicity (Hoehn, K, PNAS, 19805130). Using a similar approach, we show that mitochondrial membrane potential is maintained in SMPD5 overexpressing cells (Sup. Fig. 2J - and Author response image 2). Given that toxicity is associated with a loss of mitochondrial membrane potential (eg., 50uM Saclac; RH panel), these data suggest SMPD5 overexpression is not causing overt toxicity.

      Author response image 3.

      Furthermore, we conducted an overrepresentation analysis of molecular processes within our proteomic data from SMPD5-overexpressing cells. As depicted below, no signs of cell toxicity were observed in our model at the protein level. This data is now available in supplementary table 1.

      Author response table 1.

      Our results are therefore consistent with a pathological condition induced by elevated levels of ceramides independently of cellular toxicity. The following text has been added to the discussion:“...downregulation of the respirasome induced by ceramides may lead to CoQ depletion.

      Despite the significant impact of ceramide on mitochondrial respiration, we did not observe any indications of cell damage in any of the treatments, suggesting that our models are not explained by toxicity and increased cell death (Sup. Fig. 2H & J).”

      1. The conclusions could be strengthened by more extensive studies in mice to assess the interplay between mitochondrial ceramides, CoQ depletion and ETC/mitochondrial dysfunction in the context of a standard diet versus HF diet-induced insulin resistance. Does P053 affect mitochondrial ceramide, ETC protein abundance, mitochondrial function, and muscle insulin sensitivity in the predicted directions?

      Response: We agree with the referee about the importance of performing in vivo studies to corroborate our in vitro data. We have now conducted extensive new studies in mice skeletal muscle using targeted metabolomic and lipidomic analyses to investigate the impact of ceramide depletion in CoQ levels in HF-fed mice. Mice were exposed to a HF-fed diet with or without the administration of P053 (selective inhibitor of CerS1) for 5 weeks. As illustrated in the figures below, the administration of P053 led to a reduction in ceramide levels (left panel), increase in CoQ levels (middle panel) and a negative correlation between these molecules (right panel), which is consistent with our in vitro findings.

      Author response image 4.

      Additional suggestions:

      1. Figure 1: How does increased mitochondrial ceramide affect fatty acid oxidation (FAO) in L6-myocytes? As the accumulation of mitochondrial ceramide inhibits respirasome and mitochondrial activity in vitro, can reduce FAO in vivo, due to high mitochondrial ceramide, accounts for ectopic lipid deposition in skeletal muscle of obese subjects?

      Response: We appreciate the reviewer for bringing up this intriguing point. We would like to emphasise that Complex II activity is vital for fatty acid oxidation. As shown in Fig. 5H, our results indicate that specifically Complex II mediated respiration was diminished in cells with SMPD5 overexpression, suggesting that ceramides hinder the mitochondria's capability to oxidise lipids. We agree that this mechanism may potentially play a role in the ectopic lipid accumulation seen in individuals with obesity.

      We have added the following text to discussion:

      “...the mitochondria to switch between different energy substrates depending on fuel availability, named “metabolic Inflexibility”...this mechanism may potentially play a role in the ectopic lipid accumulation seen in individuals with obesity, a condition linked with cardio-metabolic disease.”

      1. Figure 2: Although the authors show that mtSMPD5 overexpression does not affect ceramide abundance in whole cell lysate, it would be critical to examine the abundance of this lipid in other cellular membranes and organelles, particularly plasma membrane. What is the effect of mtSMPD5 overexpression on plasma membrane lipids composition? Does that affect GLUT4-containing vesicles fusion into the plasma membrane, possibly due to depletion of v-SNARE or tSNARE?

      Response: While we acknowledge the importance of this point we strongly feel that measuring lipids in purified membranes has its limitations because it is impossible to purify specific membranes without contamination from other kinds of membranes. For example, we have done proteomics on purified plasma membranes from different cell types and we always observe considerable mitochondrial contamination with these membranes (e.g. PMID 21928809). This was the main factor that led us to use the mitochondrial targeting approach.

      Nevertheless we do acknowledge that there is a possibility that ceramides that are produced in the mitochondria in SMPD5 cells could leak out of mitochondria into other membranes and this could influence other aspects of GLUT4 trafficking and insulin action. However, we believe that the studies using mito targeted ASAH mitigate against this problem. Thus, we have now included a statement in the revised manuscript as follows: “It is also possible that ceramides generated within mitochondria in SMPD5 cells leak out from the mitochondria into other membranes (e.g. PM and Glut4 vesicles) affecting other aspects of Glut4 trafficking and insulin action. However, the observation that ASAH1 overexpression reversed IR without affecting whole cell ceramides argues against this possibility.”.

      1. Figure 4: One critical piece of information missing is the effect (if any) of mitochondrial ceramide accumulation on the mRNAs encoding the ETC components affected by this lipid. Although the ETC protein's lower stability may account for the effect of increased ceramide, transcriptional inhibition can't be ruled out without checking the mRNA expression levels for these ETC components.

      Response: To address this point, we have quantified the mRNA abundance of nine complex I subunits that exhibit downregulation in our proteomic dataset subsequent to mtSMPD5 overexpression (as depicted in Figure 4G).

      Induction of mtSMPD5 expression with doxycycline (below - Left hand panel) had no effect on the mRNA levels of the Complex I subunits (below - right hand panel).. This is consistent with our initial hypothesis that the reduction in electron transport chain (ETC) components, caused by heightened ceramide levels, primarily arises from alterations in protein stability rather than gene expression. While we acknowledge the possibility that certain subunits might be regulated at the transcriptional level, the absence of mRNA downregulation across our data strongly suggests that, at the very least, a portion of the observed protein depletion is attributed to diminished protein stability. We have incorporated this dataset into Supplementary Figure 6J and added the following text to the results:

      Author response image 5.

      “Importantly, CI downregulation was not associated with reduction in gene expression as shown in Sup. Fig. 6J.”

      Additionally, we have added the following text to discussion:

      “In addition, the absence of mRNA downregulation in mtSMPD5 overexpressing cells strongly suggests that at least a portion of the observed protein depletion within CI is attributed to diminished protein stability.”

      1. Figure 3: The authors state that neither palmitate nor mtASAH1 overexpression affected insulin-dependent Akt phosphorylation. However, the results in Figure 3F-G do not support this conclusion, as the overexpression of mtASAH1 does enhance the insulin-stimulated AKT (thr-308) phosphorylation. They need to clarify this issue.

      Response: We have now analysed these data in a manner that preserves the control variance, consistent with the other figures in the manuscript and there is no significant change in Akt phosphorylation in ASAH over-expressing cells.

      Author response image 6.

      1. Figure S2: A functional assessment of mitochondrial function in HeLa cells would be helpful to validate the small effect of Saclac treatment on CI NDUFB8.

      Response: Mitochondrial respiration was conducted in cells treated with Saclac (2 µM and 10 µM) for 24 hours. As shown below, in Hela cells, we did not detect any mitochondrial respiratory impairments at low doses, but only at high doses of Saclac. This suggests that the minor effect of Saclac on CI NDUFB8 is insufficient to alter mitochondrial function.

      Author response image 7.

      Reviewer #2 (Recommendations For The Authors):

      Additional questions and comments for consideration:

      1. The working model links ceramide-induced CoQ depletion to a reduction in ETC proteins and accompanying deficits in OxPhos capacity. The idea that mitochondrial dysfunction necessarily precedes and causes insulin resistance has been heavily debated for years because many animal and human studies have found no overt changes in ETC proteins and/or mitochondrial respiratory capacity during the early phases of insulin resistance. How do the investigators reconcile their work in the context of this controversy?

      Response: We acknowledge this controversy in our revised manuscript more clearly now as follows on page 21: “We present evidence that mitochondrial dysfunction precedes insulin resistance. However, previous studies have failed to observe changes in mitochondrial morphology, respiration or ETC components during early stages of insulin resistance (72). However, in many cases such studies fail to document changes in insulin-dependent glucose metabolism in the same tissue as was used for assessment of mitochondrial function. This is crucial because we and others do not observe impaired insulin action in all muscles from high fat fed mice for example. In addition, surrogate measures such as insulin-stimulated Akt phosphorylation may not accurately reflect tissue specific insulin action as demonstrated in figure 1C. Thus, further work is required to clarify some of these inconsistencies''.

      1. While the utility of targeting SMPD5 to the mitochondria is appreciated, the results in Figure 5 suggest that this manoeuvre caused a rather severe form of mitochondrial dysfunction. Is this representative of pathophysiology or toxicity?

      Response: We believe we have addressed this in point 3 above (Principal comments, reviewer 1, point 3)

      1. How did this affect other mitochondrial lipids (e.g. cardiolipin)?

      Response: As shown in the supplementary figure 3, SMPD5 overexpression did not affect other lipids species such as cardiolipin (D-J). We have added to results:

      “Importantly, mtSMPD5 overexpression did not affect ceramide abundance in the whole cell lysate nor other lipid species inside mitochondria such as cardiolipin, cholesterol and DAGs (Sup. Fig. 3 A, D-J)”

      1. Are these severe effects rescued by CoQ supplementation?

      Response: We have performed additional experiments to address this point. As shown below, mitochondrial ceramide accumulation induced by palmitate was not reversed by CoQ supplementation, as demonstrated in Figure 1F. We have added to results:

      “Addition of CoQ9 had no effect on control cells but overcame insulin resistance in palmitate treated cells (Fig. 1A). Notably, the protective effect of CoQ9 appears to be downstream of ceramide accumulation, as it had no impact on palmitate-induced ceramide accumulation (Fig. 1E-F). Strikingly, both myriocin and CoQ9…”

      Additionally, we assessed mitochondrial respiration by using SeaHorse in cells with SMPD5 overexpression treated with or without CoQ supplementation. Our results, depicted below, indicate that CoQ supplementation reversed the ceramide-induced decrease in basal and ATP linked mitochondrial respiration. We have modified Fig.5.

      Author response image 8.

      We have added to results:

      “Respiration was assessed in intact mtSMPD5-L6 myotubes treated with CoQ9 by Seahorse extracellular flux analysis. mtSMPD5 overexpression decreased basal and ATP-linked mitochondrial respiration (Fig. 5 A, B &C), as well as maximal, proton-leak and non-mitochondrial respiration (Fig. 5 A, D, E & F) suggesting that mitochondrial ceramides induce a generalised attenuation in mitochondrial function. Interestingly, CoQ9 supplementation partially recovered basal and ATP-linked mitochondrial respiration, suggesting that part of the mitochondrial defects are induced by CoQ9 depletion. The attenuation in mitochondrial respiration is consistent with a depletion of the ETC subunits observed in our proteomic dataset (Fig. 4)...”

      1. Are these same effects observed with other manipulations that lower CoQ to a similar degree?

      Response: As mentioned in point 5 (additional suggestions from Reviewer 1), we conducted mitochondrial respiration measurements on HeLa cells treated with Saclac (2 µM and 10 µM) for 24 hours. Our findings showed no signs of mitochondrial respiratory impairments at low doses of Saclac in HeLa cells, despite observing CoQ depletion at this dose (Fig. Sup. 2C). We believe that this variation could be due to the varying sensitivity of mitochondrial respiration/ETC abundance to ceramide-induced CoQ depletion in different cell lines. Alternatively, it is possible that reduced mitochondrial respiration is a secondary event to other mitochondrial/cellular defects such as mitochondrial fragmentation or deficient nutrient transport inside mitochondria.

      *Author response image 9.

      1. The mitochondrial concentrations of CoQ required to maintain insulin sensitivity in L6 myocytes seem to vary from experiment to experiment. Is it the absolute concentration that matters and/or the change relative to a baseline condition?

      Response: This is an excellent observation. The findings indicate that the absolute concentration of CoQ is the determining factor for insulin sensitivity, rather than the relative depletion of CoQ compared to basal conditions. We have added to discussion: “Finally, mtASAH1 overexpression increased CoQ levels. In both control and mtASAH1 cells, palmitate induced a depletion of CoQ, however the levels in palmitate treated mtASAH1 cells remained similar to control untreated cells (Fig. 3I). This suggests that the absolute concentration of CoQ is crucial for insulin sensitivity, rather than the relative depletion compared to basal conditions, thus supporting the causal role of mitochondrial ceramide accumulation in reducing CoQ levels in insulin resistance”

      1. Considering that CoQ has been shown to have antioxidant properties, does the rescue observed after a 16 h treatment require the prolonged exposure, or alternatively, are similar effects observed during short-term exposures (~1-2 h), which might imply a different or additional mechanism.

      Response: This is an excellent point that we have long considered. The problem is how to address the question in a way that will be definitive and we are concerned that the experiment suggested by the referee will not generate definitive data. A major issue is that CoQ has low solubility and needs to reach the right compartment. As such if short term treatment (as suggested) does not rescue, it would be difficult to make any definite conclusions as this might just be because insufficient CoQ is delivered to mitochondria. Conversely, if short term treatment does rescue this could be either because CoQ does get into mitochondria and regulate ETC or because of its general antioxidant function. So, even if we observe a rescue after 1 hour of incubation with CoQ, it will not clarify whether this is due to the antioxidant effect or simply because 1 hour is adequate to boost mitoCoQ levels. Thus, in our view this experiment might not get us any closer to the answer. Nevertheless, we do feel this is an important point and we have added the following statement to our revised manuscript to acknowledge this: “Because CoQ can accumulate in various intracellular compartments, it's important to consider that its impact on insulin resistance might be due to its overall antioxidant properties rather than being limited to a mitochondrial effect”

      1. In Figure 1, CoQ depletion due to 4NB treatment resulted in increased ceramide levels. Could this be due to impaired palmitate oxidation leading to rerouting of intracellular palmitate to the ceramide pathway? This could be tested using stable isotope tracers.

      Response: We have added the statement below to the manuscript to address this point. We feel that while an interesting experiment to perform it is somewhat outside of the major focus of this study.

      “One possibility is that CoQ directly controls ceramide turnover (35). An alternate possibility is that CoQ inside mitochondria is necessary for fatty acid oxidation (12) and CoQ depletion triggers lipid overload in the cytoplasm promoting ceramide production (36). Future studies are required to determine how CoQ depletion promotes Cer accumulation. Regardless, these data indicate that ceramide and CoQ have a central role in regulating cellular insulin sensitivity.”

      1. To a similar point, it would be helpful to know if the C2 ceramide analog is sufficient to cause elevated mito-ceramide and/or CoQ depletion. If not, the results might imply mitochondrial uptake of palmitate is required.

      Response: We feel this point is analogous to Point 7 above in that this experiment is not definitive enough to make any clear conclusions as it may or may not work for many different reasons. For example, C2 ceramide may not work simply because it has the wrong chain length.

      Moreover, it is clear that C2 ceramide has effects that clearly differ from those observed with palmitate most notably the inhibitory effect on Akt signalling. For these reasons we do not agree with the logic of this experiment.

      We have mentioned in the results section:

      “Based on these data we surmise that C2-ceramide does not faithfully recapitulate physiological insulin resistance, in contrast to that seen with incubation with palmitate”.

      1. Likewise, does inhibition of CPT1 ameliorate or exacerbate palmitate-induced insulin resistance?

      Response: This experiment has been performed by a number of different labs. For instance, muscle specific CPT1 overexpression is protective against high fat diet induced insulin resistance in mice (Bruce C, PMID19073774), CPT1 overexpression protects L6E9 muscle cells from fatty acid-induced insulin resistance (Sebastian D, PMID17062841) and increased beta-oxidation in muscle cells enhances insulin stimulated glucose metabolism and is protective against lipid induced insulin resistance (Perdomo G, PMID15105415). We have now cited all of these studies in our revised manuscript in the discussion: “In fact, increased fatty acid oxidation is protective against insulin resistance in several model organisms (37–39)”

      1. Does the addition of palmitate to the cells treated with mtSMPD5 further reduce CoQ9 (Figure 2I and 2J)?

      Response: This intriguing observation, as highlighted by the referee, has prompted us to conduct additional experiments to investigate the effects of palmitate and SMPD5 overexpression on Coenzyme Q (CoQ) levels in L6 myotubes. As demonstrated in the figures presented below, both palmitate and SMPD5 overexpression independently resulted in the depletion of CoQ9, with no observed additive effects suggesting that they shared a common pathway driving CoQ9 deficiency. One plausible hypothesis is that ceramides may trigger the depletion of a specific CoQ9 pool localised within the inner mitochondrial membrane, likely the pool associated with Complex I (CI) in the Electron Transport Chain (ETC). This hypothesis is supported by previous studies indicating that approximately ~25 - 35 % of CoQ binds to CI (PMID: 33722627) and our data demonstrating that ceramide induces a selective depletion of CI in L6 myotubes (Fig. 4).

      We have added this result to Fig. 2I in the main section.

      Author response image 10.

      We have added to the result section:

      “Mitochondrial CoQ levels were depleted in both palmitate-treated and mtSMPD5-overexpressing cells without any additive effects. This suggests that these strategies to increase ceramides share a common mechanism for inducing CoQ depletion in L6 myotubes (Fig. 2I).”

      We have added to the discussion section:

      “...These are known to form supercomplexes or respirasomes where ~25 - 35 % of CoQ is localised in mammals (58,16).…The observation that both palmitate and SMPD5 overexpression trigger CoQ depletion without additive effects support the notion that ceramides may trigger the depletion of a specific CoQ9 pool localised within the inner mitochondrial membrane.”

      1. Some of the cell-based experiments appear to be underpowered and therefore confidence in the interpretations might benefit from additional repeats. For example, in Figure 3i, it appears that palmitate still causes a substantial reduction of CoQ in the cells treated with mtASAH1, even though mito-ceramide levels are restored to baseline. Please specify if these and other results are representative of multiple cell culture experiments or a single experiment.

      Response: All data were derived from a minimum of 3-4 independent experiments from at least two separate cultures of L6 cells. Separate batches of drug treatments were prepared for each experiment. We have previously compared metabolic parameters between batches of cells differentiated at different times (i.e. at least weeks apart) in a previous study (Krycer PMID 31744882) and found variations of <20% for insulin-stimulated glucose oxidation. With an expected variance of 20% and a type I error rate of 0.05, this is sufficient to detect a 40% difference with a power of 0.8. As the reviewer has indicated this is likely underpowered in situations where variance is unexpectedly high or if a small difference needs to be detected.

      In terms of Fig3, the reviewer raises an interesting point. As discussed in point 6, the fact that palmitate still appears to cause a depletion of CoQ in mtASAH1 cells likely indicates that the absolute concentration of CoQ is the determining factor for insulin sensitivity, rather than the relative depletion of CoQ compared to basal conditions. We have added to the discussion:

      “Finally, mtASAH1 overexpression increased CoQ levels. In both control and mtASAH1 cells, palmitate induced a depletion of CoQ, but this effect was less pronounced in the mtASAH1 cell line (Fig. 3I). Our results suggest that the absolute concentration of CoQ is crucial for insulin sensitivity, rather than the relative depletion compared to basal conditions, thus supporting the causal role of mitochondrial ceramide accumulation in reducing CoQ levels in insulin resistance”

      1. The color scheme of 2E is inconsistent with other panels in the figure.

      Response: Corrected

      1. It would be helpful if the axis labels for CoQ graphs were labeled as "Mito-CoQ" for clarity.

      Response: Corrected

    1. Author Response

      The following is the authors’ response to the previous reviews

      We appreciate the positive comments from the editors and reviewers. The followings are the point to point responses to the questions and comments of the Reviewers:

      Reviewer #1 (Public Review):

      In this study, Jiamin Lin et al. investigated the potential positive feedback loop between ZEB2 and ACSL4, which regulates lipid metabolism and breast cancer metastasis. They reported a correlation between high expression of ZEB2 and ACSL4 and poor survival of breast cancer patients, and showed that depletion of ZEB2 or ACSL4 significantly reduced lipid droplets abundance and cell migration in vitro. The authors also claimed that ZEB2 activated ACSL4 expression by directly binding to its promoter, while ACSL4 in turn stabilized ZEB2 by blocking its ubiquitination. While the topic is interesting, there are several concerns with the study:

      1. My concern regarding the absence of appropriate thresholds or False Discovery Rate (FDR) adjustments for the RNA-seq analysis has not been addressed, leading to incorrect thresholds and erroneous identification of significant signals.

      Response: We thank the reviewer for the concern about the RNA-seq analysis. RNA-seq data was analyzed by the Benjamini and Hochberg’s approach for controlling the false discovery rate. The procedure of RNA-seq bioinformatic analysis is as follows: For data analysis, raw data of fastq format were firstly processed through in-house perl scripts. In this step, clean data were obtained by removing reads containing adapter, reads containing N base and low quality reads from raw data. All the downstream analyses were based on the clean data with high quality. Index of the reference genome was built using Hisat2 v2.0.5 and paired-end clean reads were aligned to the reference genome using Hisat2 v2.0.5. FeatureCounts v1.5.0-p3 was used to count the reads numbers mapped to each gene, and then FPKM of each gene was calculated based on the length of the gene and reads count mapped to this gene. Differential expression analysis of two conditions/groups was performed using the DESeq2 R package (1.20.0). The resulting P-values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate. Genes with an adjusted P-value (<0.05) found by DESeq2 were assigned as differentially expressed.

      1. In Figure 3B and C, it appears that the knockdown efficiency of ACSL4 is inadequate in these cells, which contradicts the Western blot results presented in Figure 2F.

      Response: We thank the reviewer for the concern. In figure 3B and 3C, we use the shRNA for the knockdown experiment and in Figure 2F we use siRNA for the knockdown experiment, so the efficiency of them were different.

      1. Regarding Figure 6, the discovery of consensus binding sequences (CACCT) for ZEB2 alone is insufficient evidence to support the direct binding of ZEB2 to the ACSL4 promoter.

      Response: We thank the reviewer for the concern. We performed chromatin immunoprecipitation (ChIP), which examines the direct interaction between DNA and protein, to test if ZEB2 directly binds to the ACSL4 promoter. The results showed that the primer set 1, which covered -184 to -295 of ACSL4 promoter region exhibited apparent ZEB2 binding (Fig. 6F). Moreover, the mutant sequence (AAAA) of ACSL4 promoter showed significant decreased luciferase activity (Fig. 7H). All these evidences suggest that ZEB2 directly bond to the consensus sequence of ACSL4 promoter.

      1. For Figure 7E, there are multiple bands present, and it appears that ZEB2-HA has been cropped, which should ideally be presented with unaltered raw data. Please provide the uncropped raw data.

      Response: We thank the reviewer for the concern. The raw data of the figure 7E ZEB2-HA is shown in Author response image 1:

      Author response image 1.

      1. In Figure 7C, the author claimed to have used 293T cells for the ubiquitin assay, which are not breast cancer cells. Moreover, the efficiency of over-expression differs between ZEB2 and ACSL4 in 293T cell lines. Performing the experiment in an unrelated cell line to justify an important interaction is not acceptable.

      Response: We thank the reviewer for the concern. We also performed the ubiquitination assay in MDA-MB-231 cells in Fig 7D (Author response image 2), The results confirm that knockdown of ACSL4 obviously enhanced the ubiqutination of ZEB2. We also have performed the IP experiment in MDA-MB-231 cells in Author response image 3 (Fig 7F). The results confirmed the interaction between ZEB2 and ACSL4:

      Author response image 2.

      Author response image 3.

      Reviewer #2 (Public Review):

      In this study, the authors validated a positive feedback loop between ZEB2 and ACSL4 in breast cancer, which regulates lipid metabolism to promote metastasis.

      Overall, the study is original, well structured, and easy to read.

      We appreciate the positive comments from the reviewer.

      Reviewer #3 (Public Review):

      The manuscript by Lin et al. reveals a novel positive regulatory loop between ZEB2 and ACSL4, which promotes lipid droplets storage to meet the energy needs of breast cancer metastasis.

      We appreciate the positive comments from the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      I still have some points that should be addressed by the Authors:

      The interaction between ACSL4 and ZEB2 is still not convincing, due to the cellular localization of ACSL4 and ZEB2 is different. The authors should consider utilizing the Duolink experiment to more accurately determine the interaction location of these two proteins in cells.

      Response: We appreciate the reviewer’s suggestion. We performed GST pull-down assay to examine whether ZEB2 and ACSL4 form a complex. GST pull-down assay confirmed the interaction of ZEB2 and ACSL4 (Supplementary Fig. S10). We also performed immunofluorescence assay and found that ZEB2 was co-localized with ACSL4 in some certain regions of the cytoplasm in Author response image 5 (Supplementary Fig. S11):

      Author response image 4.

      Author response image 5.

      In Figure S4, the authors showed both "shACSL4" and "siACSL4", which is a description error.

      Response: We appreciate the reviewer to point out the mistake. We have corrected the "siACSL4" into "shACSL4".

      Author response image 6.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript is improved.

      We appreciate the positive comments from the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors attempt to describe alterations in gene expression, protein expression, and protein phosphorylation as a consequence of chronic adenylyl cyclase 8 overexpression in a mouse model. This model is claimed to have resilience to cardiac stress.

      Major strengths of the study include 1) the large dataset generated which will have utility for further scientific inquiry for the authors and others in the field, 2) the innovative approach of using cross-analyses linking transcriptomic data to proteomic and phosphoproteomic data. One weakness is the lack of a focused question and clear relevance to human disease. These are all critical biological pathways that the authors are studying and essentially, they have compiled a database that could be surveyed to generate and test future hypotheses.

      Thank you for your efforts to review our manuscript, we are delighted to learn that you found our approach to link transcriptomic, proteomic and phosphoproteome data in our analysis to be innovative. Your comment that we have not focused on a question with clear relevance to human disease is “right on point!”

      During chronic pathophysiologic states e.g., chronic heart failure (CHF) in humans, AC/cAMP/PKA/Ca2+ signaling increases progressively the degree of heart failure progresses, leading to cardiac inflammation, mediated in part, by cyclic-AMP- induced up- regulation of renin-angiotensin system (RAS) signaling. Standard therapies for CHF include β-adrenoreceptor blockers and RAS inhibitors, which although effective, are suboptimal in amelioration of heart failure progression. One strategy to devise novel and better therapies for heart failure, would be to uncover the full spectrum of concentric cardio- protective adaptations that becomes activated in response to severe, chronic AC/cAMP/PKA/Ca2+ -induced cardiac stress.

      We employed unbiased omics analyses, in our prior study (https://elifesciences.org/articles/80949v1) of the mouse harboring cardiac specific overexpression of adenylyl cyclase type 8 (TGAC8), and identified more than 2,000 transcripts and proteins, comprising a broad array of biological processes across multiple cellular compartments, that differed in TGAC8 left ventricle compared to WT. These bioinformatic analyses revealed that marked overexpression of AC8 engages complex, concentric adaptation "circuity" that has evolved in mammalian cells to confer resilience to stressors that threaten health or life. The main human disease category identified in these analyses was Organismal Injury and Abnormalities, suggesting that defenses against stress were activated as would be expected, in response to cardiac stress. Specific concentric signaling pathways that were enriched and activated within the TGAC8 protection circuitry included cell survival initiation, protection from apoptosis, proliferation, prevention of cardiac-myocyte hypertrophy, increased protein synthesis and quality control, increased inflammatory and immune responses, facilitation of tissue damage repair and regeneration and increased aerobic energetics. These TGAC8 stress response circuits resemble many adaptive mechanisms that occur in response to the stress of disease states and may be of biological significance to allow for proper healing in disease states such as myocardial infarction or failure of the heart. The main human cardiac diseases identified in bioinformatic analyses were multiple types cardiomyopathies, again suggesting that mechanisms that confer resilience to the stress of chronic increased AC-PKA-Ca2+ signaling are activated in the absence of heart failure in the super-performing TGAC8 heart at 3-months of age.

      In the present study, we performed a comprehensive in silico analysis of transcription, translation, and post-translational patterns, seeking to discover whether the coordinated transcriptome and proteome regulation of the adaptive protective circuitry within the AC8 heart that is common to many types of cardiac disease states identified in our previous study (https://elifesciences.org/articles/80949v1) extends to the phosphoproteome.

      Reviewer #2 (Public Review):

      In this study, the investigators describe an unbiased phosphoproteomic analysis of cardiac-specific overexpression of adenylyl cyclase type 8 (TGAC8) mice that was then integrated with transcriptomic and proteomic data. The phosphoproteomic analysis was performed using tandem mass tag-labeling mass spectrometry of left ventricular (LV) tissue in TGAC8 and wild-type mice. The initial principal component analysis showed differences between the TGAC8 and WT groups. The integrated analysis demonstrated that many stress-response, immune, and metabolic signaling pathways were activated at transcriptional, translational, and/or post-translational levels.

      The authors are to be commended for a well-conducted study with quality control steps described for the various analyses. The rationale for following up on prior transcriptomic and proteomic analyses is described. The analysis appears thorough and well-integrated with the group's prior work. Confirmational data using Western blot is provided to support their conclusions. Their findings have the potential of identifying novel pathways involved in cardiac performance and cardioprotection.

      Thank you for your efforts to review our manuscript, we are delighted to learn that you found our approach to link transcriptomic, proteomic and phosphoproteome data in our analysis. We are delighted that you found our work to be well-conducted, to have been well performed, and that our analysis was thorough and well-integrated with our prior work in this arena and that are findings have the potential of identifying novel pathways involved in cardiac performance and cardioprotection.

      Reviewer #1 (Recommendations For The Authors):

      I humbly suggest that the authors reconsider the title, as it could be more clear as to what they are studying. Are the authors trying to highlight pathways related to cardiac resilience? Resilience might be a clearer word than "performance and protection circuitry".

      Thank you for this important comment. We have revised the title accordingly: Reprogramming of cardiac phosphoproteome, proteome and transcriptome confers resilience to chronic adenylyl cyclase-driven stress.

      Perhaps the text can be reviewed in detail by a copy-editor, as there are many grammatically 'awkward' elements (for example, line 56: "mammalians" instead of mammals), inappropriate colloquialisms (for example, line 73: "port-of-call"), and stylistic unevenness that make it difficult to read.

      We have reviewed the text in detail, with the assistance of a copy editor, in order to identify and correct awkward elements and to search for other colloquialisms. Finally, although “stylistic unevenness” to which you refer may be difficult for us to identify during our re-edits, we have tried our best to identify and revise them.

      The best-written sections are the first few paragraphs of the discussion section, which finally clarify why the TGAC8 mouse is important in understanding cardiac resilience to stress and how the present study leverages this model to disentangle the biological processes underlying the resilience. I wish this had been presented in this manner earlier in the paper, (in the abstract and introduction) so I could have had a clearer context in which to interpret the data. It would also be helpful to point out whether the TGAC8 mouse has any correlates with human disease.

      Thank you for this very important comment. Well put! In addition to recasting the title to include the concept of resilience, we have revised both the abstract and introduction to feature what you consider to be important to the understanding of cardiac resilience to stress, and how the present study leverages this model to disentangle the biological processes underlying the resilience.

      Reviewer #2 (Recommendations For The Authors):

      1. How were the cutoffs determined to distinguish between upregulated/downregulated phosphoproteins and phosphopeptides?

      Thank you for this important question. We used the same criteria to distinguish differences between TGAC8 and WT for unnormalized and normalized phosphoproteins, -log10(p-value) > 1.3, and log2FoldChange <= -0.4 (down) or log2FoldChange >= 0.4 (up), as stated in the methods section, main text and figure legend. The results were consistent across all analyses and selectively verified by experiments.

      1. Were other models assessed for correlation between transcriptome and phosphoproteome other than a linear relationship of log2 fold change?

      Thank you for this comment. In addition to a linear relationship of log2 fold change of molecule expression, we also compared protein activities, e.g., Fig 4F, and pathways enriched from different omics, e.g., Fig 3D, 5J, 6B and 6F.

      1. Figures 1A and 5G seem to show outliers. How many biological and technical replicates would be needed to minimize error?

      Thank you for the question. Figures 1A and 5G were PCA plots which, as expected, manifested some genetic variability among the same genotypes. The PCA plots, however, are useful in determining how the identified items separated, both within and among genotypes. For bioinformatics analysis such as ours, 4-5 samples are sufficient to accomplish this, as demonstrated by separation, by genotype, of samples in PCA. Thus, in addition to discovery of true heterogeneity among the samples, our results are still able to robustly discover the true differences between the genotypes.

      1. Were the up/downregulated genes more likely to be lowly expressed (which would lead to larger log2 changes identified)?

      In response to your query, we calculated the average expression of phosphorylation levels across all samples to observe whether they were expressed in low abundance in all samples. We also generated the MA plots, an application of a Bland–Altman plot, to create a visual representation of omics data. The MA plots in Author response image 1 illustrate that the target molecules with significantly changed phosphorylation levels did not aggregate within the very low abundance. To confirm this conclusion, we adopted two sets of cutoffs: (1) change: -log10(p-value) > 1.3, and log2FoldChange < 0 (down) or log2FoldChange > 0 (up); and (2) change_2: -log10(p-value) > 1.3, and log2FoldChange <= -0.4 (down) or log2FoldChange >= 0.4 (up).

      Author response image 1.

      1. "We verified some results through wet lab experiments" in the abstract is vague.

      Thank you for the good suggestion. What we meant to indicate here was that identified genotypic differences in selected proteins, phosphoproteins and RNAs discovered in omics were verified by western blots, protein synthesis detection, proteosome activity detection, and protein soluble and insoluble fractions detection. However, we have deleted the reference to the wet lab experiments in the revised manuscript.

      1. There are minor syntactical errors throughout the text.

      Thank you very much for the suggestion. As noted in our response, we have edited and revised those errors throughout the text.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The investigators have performed a state-of-the art systematic review and meta-analysis of studies that may help to answer the research question: if administration of multiple antibiotics simultaneously prevents antibiotic resistance development in individuals. The amount of studies eligible for analysis is very low, and within that low number, there is huge variability in bug-drug combinations studied and most studies had a high risk of bias, further limiting the capability of meta-analysis to answer the research question. In addition, based on I2 values there is also huge statistical heterogeneity between outcomes of studies compared, further limiting the predictive value of meta-analysis. In fact, the only 2 studies meeting all eligibility criteria addressed the treatment of mycobacterium tuberculosis, for which the research question is hardly applicable. The authors, therefore, conclude that "our analysis could not identify any benefit or harm of using a higher or a lower number of antibiotics regarding within-patient resistance development." Apart from articulating this knowledge gap, the findings will not have consequences for patient care, but may stimulate the scientific community to better address this research question in future studies.

      Strengths:

      The systematic and rigorous approach for the review and meta-analysis.

      Weaknesses:

      None identified.

      We thank the reviewer for this thoughtful and positive appraisal of our work.

      Reviewer #2 (Public Review):

      Summary:

      The authors performed a systematic review and meta-analysis to investigate whether the frequency of emergence of resistance is different if combination antibiotic therapy is used compared to fewer antibiotics. The review shows that there is currently insufficient evidence to reach a conclusion due to the limited sample size. High-quality studies evaluating appropriate antimicrobial resistance endpoints are needed.

      Strengths:

      The strengths of the manuscript are that the article addresses a relevant research question that is often debated. The article is well-written and the methodology used is valid. The review shows that there is currently insufficient evidence to reach a conclusion due to the limited sample size. High-quality studies evaluating appropriate antimicrobial resistance endpoints are needed. I have several comments and suggestions for the manuscript.

      Weaknesses:

      Weaknesses of the manuscript are the large clinical and statistical heterogeneity and the lack of clear definitions of acquisition of resistance. Both these weaknesses complicate the interpretation of the study results.

      We thank the reviewer for the positive comments and pointing out where our work can be improved.

      Major comments:

      My main concern about the manuscript is the extent of both clinical and statistical heterogeneity, which complicates the interpretation of the results. I don't understand some of the antibiotic comparisons that are included in the systematic review. For instance the study by Paul et al (50), where vancomycin (as monotherapy) is compared to co-trimoxazole (as combination therapy). Emergence (or selection) of co-trimoxazole in S. aureus is in itself much more common than vancomycin resistance. It is logical and expected to have more resistance in the co-trimoxazole group compared to the vancomycin group, however, this difference is due to the drug itself and not due to co-trimoxazole being a combination therapy. It is therefore unfair to attribute the difference in resistance to combination therapy. Another example is the study by Walsh (71) where rifampin + novobiocin is compared to rifampin + co-trimoxazole. There is more emergence of resistance in the rifampin + co-trimoxazole group but this could be attributed to novobiocin being a different type of antibiotic than co-trimoxazole instead of the difference being attributed to combination therapy. To improve interpretation and reduce heterogeneity my suggestion would be to limit the primary analyses to regimens where the antibiotics compared are the same but in one group one or more antibiotic(s) are added (i.e. A versus A+B). The other analyses are problematic in their interpretation and should be clearly labeled as secondary and their interpretation discussed.

      We acknowledge the presence of statistical and clinical heterogeneity in our overall analysis. The decision to pursue this comprehensive examination was predefined in our previously published study protocol (PROSPERO CRD42020187257) and driven by our interest whether, despite some differences, we could either identify an overarching effect of combination therapy on resistance or identify factors that explain potential differences of the effect of combination therapy across pathogens/drugs. We indeed, find that heterogeneity is high, however identifying the driving factors of this heterogeneity is difficult as evidence is limited.

      We carried out several subgroup analyses, e.g. explicitly focusing on specific pathogen groups and medical conditions or exploring heterogeneity in treatment arms (figure 3, supplementary materials section 6). However, it is important to highlight that the number of studies available for these subgroup analyses was low. Additionally, recognizing the high heterogeneity within treatment arms, we performed a subgroup analysis focusing solely on resistances of antibiotics common to both arms (supplementary material section 6.1.8; which would avoid comparisons such as the one between vancomycin and co-trimoxazole raised by the reviewer). Unfortunately, this also revealed substantial heterogeneity. While we aimed to address heterogeneity through these subgroup analyses, limitations arose due to the number of studies meeting specific criteria and the nature of data provided by these studies.

      Moreover, regarding the concern on interpretation of co-trimoxazole as combination therapy, we acknowledge the confusion surrounding its classification as one or two antibiotics. Despite the common contemporary view of co-trimoxazole as a single antibiotic, we chose to consider it as two antibiotics due to historical practices, as observed in Black et al. (1982), where trimethoprim was compared to trimethoprim and sulfamethoxazole. We recognize that this decision may lead to confusion and we consider conducting a further sensitivity analysis in the future version of this manuscript, exploring the possibility of considering co-trimoxazole as a single antibiotic. We agree that the slight trend of less antibiotics performing better overserved for MRSA, should not be over interpreted as this is driven by the two studies Walsh et al 1993 and Paul et al 2015 as pointed out by the reviewer. In lines 183-186 we discuss this issue that for better evaluation of antibiotic combination therapy, more studies which use identical antibiotics (i.e. A versus A+B) are needed. We will try to clarify and highlight this in the future version of the manuscript.

      Another concern is about the definition of acquisition of resistance, which is unclear to me. If for example meropenem is administered and the follow-up cultures show Enterococcus species (which is intrinsically resistant to meropenem), does this constitute acquisition of resistance? If so, it would be misleading to determine this as an acquisition of resistance, as many people are colonized with Enterococci and selection of Enterococci under therapy is very common. If this is not considered as the acquisition of resistance please include how the acquisition of resistance is defined per included study.

      Thank you for pointing out this potential ambiguity. Our definition of “acquisition of resistance” is agnostic to bacterial species and hence intrinsically resistant species can be included if they were only detected during the follow-up culture by the studies. We will clarify this in the definition of “acquisition of the resistance” in the manuscript (see l. 259-260). However, it was not always clear from the studies which pathogens were acquired or whether intrinsically resistant species were not reported. Therefore, we rely on the studies' specifications of resistant and non-resistant without further classifying data into intrinsic and non-intrinsic resistance. The outcome “acquisition of resistance” can be seen more of a risk assessment for having any resistant bacterium during or after treatment. In contrast, the outcome “emergence of resistance” is more rigorous, demanding the same species to be measured as more resistant during or after treatment.

      Table S1 is not sufficiently clear because it often only contains how susceptibility testing was done but not which antibiotics were tested and how a strain was classified as resistant or susceptible.

      In Table S1, we omitted the listing of antibiotics for which susceptibility testing was performed, as this information is already presented in the main text (Table 1). However, we agree that linking this information better in a future version would benefit the understanding. Given the variability in methods used to assess resistance and the variability in drugs, the comparability of breakpoints is limited. Hence, we decided not to provide further details on this aspect so far.

      Line 85: "Even though within-patient antibiotic resistance development is rare, it may contribute to the emergence and spread of resistance."

      Depending on the bug-drug combination, there is great variation in the propensity to develop within-patient antibiotic resistance. For example: within-patient development of ciprofloxacin resistance in Pseudomonas is fairly common while within-patient development of methicillin resistance in S. aureus is rare. Based on these differences, large clinical heterogeneity is expected and it is questionable where these studies should be pooled.

      We agree that our formulation neglects differences in prevalence of within-host resistance emergence depending on bug-drug combinations. We will correct this in our upcoming version. (i.e. we will correct our statement to: “Within-patient antibiotic resistance development, even if rare, can contribute to the emergence and spread of resistance.”)

      Line 114: "The overall pooled OR for acquisition of resistance comparing a lower number of antibiotics versus a higher one was 1.23 (95% CI 0.68 - 2.25), with substantial heterogeneity between studies (I2=77.4%)"

      What consequential measures did the authors take after determining this high heterogeneity? Did they explore the source of this large heterogeneity? Considering this large heterogeneity, do the authors consider it appropriate to pool these studies?

      Thank you for highlighting this lack of clarity. In our upcoming version, we will emphasize the sub-analyses conducted to explore heterogeneity (i.e., figure 3 and supplementary materials section 6). Nevertheless, these analyses faced limitations due to the scarcity of evidence and the data provided by the studies. Given the lack of appropriate evidence, it is hard to identify the source of heterogeneity. The decision to pool all studies was pre-specified in our previously published study protocol (PROSPERO CRD42020187257) and was motivated by the question whether there is a general effect of combination therapy on resistance development or identify factors that explain potential differences of the effect of combination therapy across bug-drug combinations.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We confirm that that “count-down” parameter, mentioned by reviewer 1, is indeed counted from the first lockdown day and increases continuously, even when we do not have any data – and that this is clearly written in the manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (Note, while these authors do reference Derryberry et al., I thought that there could have been much more direct comparison between the results of the two approaches).

      We added some more discussion of the differences between the papers.

      One important drawback of the approach, which potentially calls into question the authors' conclusions, is that the acoustic sampling only occurred during the pandemic: for several lockdown periods and then for a period of 10 days immediately after the end of the final lockdown period in May of 2020. Several relevant things changed from March to May of 2020, most notably the shift from spring to summer, and the accompanying shift into and through the breeding season (differing for each of the three focal species). Although the statistical methods included an attempt to address this, neither the inclusion of the "count down" variable nor the temperature variable could account for any non-linear effects of breeding phenology on vocal activity. I found the reliance on temperature particularly troubling, because despite the authors' claims that it was "a good proxy of seasonality", an examination of the temperature data revealed a considerable non-linear pattern across much of the study duration. In addition, using a period immediately after the lockdowns as a "no-lockdown" control meant that any lingering or delayed effects of human activity changes in the preceding two months could still have been relevant (not to mention the fact that despite the end of an official lockdown, the pandemic still had dramatic effects on human activity during late May 2020).

      In general, the reviewer is correct, and we reformulated some of the text to more carefully address these points. However, we would like to note two things: (1) Changes occurred rapidly with birds rapidly changing their behavior – this is one of the main conclusions of our study, i.e., that urban dwelling animals are highly plastic in behavior. So that lingering effects were unlikely. (2) Changes occurred in both directions, and thus seasonality (which is expected to have a uni-directional effect) cannot explain everything we observed. We are not sure what the reviewer means by ‘considerable non-linear patterns’ when referring to the temperature. Except for ~5 days with temperatures that exceeded the expected average by 3-4 degrees, the temperature increased approximately linearly during the period as expected from seasonality (see Author response image 1). Following the reviewer’s comment, we tested whether exclusion of data from these days changes the results and found no change.

      We would like to note that in terms of breeding, all birds were within the same state during both the lockdown and the non-lockdown periods. Parakeets and crows have a long breeding season Feb-end of June with one cycle. They will stay around the nest throughout this season and especially in the peak of the season March-May. Prinias start slightly later at the beginning of March with 2-3 cycles till end of June.

      Regarding the comment about human activity, as we now also note in the manuscript, reality in Israel was actually the opposite of the reviewer’s suggestion with people returning to normal behavior towards the end of the lockdown (even before its official removal). We believe that this added noise to our results, and that the effect of the lockdown was probably higher than we observed.

      Author response image 1.

      Another weakness of the current version of the manuscript is the use of a supposed "contradiction" in the existing literature to create the context for the present study. Although the various studies cited do have many differences in their results, those other papers lay out many nuanced hypotheses for those differences. Almost none of the studies cited in this manuscript actually reported blanket increases or decreases in urban birds, as suggested here, and each of those papers includes examples of species that showed different responses. To suggest that they are on opposite sides of a supposed dichotomy is a misrepresentation. Many of those other studies also included a larger number of different species, whereas this study focused on three. Finally, this study was completed at a much finer spatial scale than most others and was examining micro-habitat differences rather than patterns apparent across landscapes. I believe that highlighting differences in scale to explain nuanced differences among studies is a much better approach that more accurately adds to the body of literature.

      We thank the reviewer for this good feedback and revised the manuscript, accordingly, placing more emphasis on the micro-scale of this study.

      Finally a note on L244-247: I would recommend against discounting the possibility that lockdowns resulted in changes to the birds' vocal acoustics, as Derryberry et al. 2020 found, especially while suggesting that their results were the effects of signal processing artifacts. Audio analysis is not my area of expertise, but isn't it possible that the birds did increase call intensity, but were simply not willing (or able) to increase it to the same degree as the additional ambient noise?

      This is an important question. The fact is that when ambient noise increases (at the relevant frequency channels), then the measured vocalizations will also increase. There is no way to separate the two effects. Thus, as scientists, when we cannot measure an effect, it is safer not to suggest an effect. Unfortunately, most studies that claim an increase in vocalizations’ intensity in noise, do not account for this potential artifact (and most of them do not estimate noise at a species-specific level as we have done). This has created a lot of “noise” in the field. We do not want to criticize the Derryberry results without analyzing the data, but from reading their methods it does not seem like they took the noise into account in their acoustic measurements. But if you look at their figure 4A you will see a lot of variability in measuring the minimum frequency – which could be strongly affected by ambient noise.

      In light of the above, we thus prefer to be careful and not to state changes that are probably false. We added some of this information to the manuscript. We also added the linear equations to the graph (in the caption of figure 3) where it can be seen that the slope is always <=1.

      Reviewer 2:

      The explanation of methods can be improved. For example, it is not clear if data were low-pass filtered before resampling to avoid aliasing.

      We edited the methods and hopefully they are clearer now. Regarding the specific question – yes, an LPF was applied to prevent aliasing before the resampling. This information was added to the manuscript.

      It is quite possible that birds move into the trees and further from the recorders with human activity. Since sound level decreases by the square of the distance of the source from the recorders, this could significantly affect the data. As indicated in the Discussion, this is a significant parameter that could not be controlled.

      The reviewer is correct, and we addressed this point. Such biases could arise with any type of surveying including manual transects (except for perhaps, placing tags on the animals). We note that we only analyzed high SNR signals and that the species we selected somewhat overcome this bias – both crows and parakeets are not shy and Prinias are anyway shy and prefer to not be out in the open. We would also expect to see a stronger effect for human speech if this was a central phenomenon, and we did not see this, but of course this might have affected our results.

      In interpreting the data, the authors mention the effect of human activity on bird vocalizations in the context of inter-species predator-prey interactions; however, the presence of humans could also modify intraspecies interactions by acting as triggers for communication of warning and alarm, and/or food calls (as may sometimes be the case) to conspecifics. Along the same lines, it is important to have a better understanding of the behavioral significance of the syllables used to monitor animal activity in the present study.

      We agree with this point and added more discussion of both this potential bias and the type of syllables that were analyzed.

      Another potential effect that may influence the results but is difficult to study, relates to the examination of vocalizations near to the ambient noise level. This is the bandwidth of sound levels where most significant changes may occur, for example, due to the Lombard effect demonstrated in bird and bat species. However, as indicated, these are also more difficult to track and quantify. Moreover, human generated noise, other than speech, may be a more relevant factor in influencing acoustic activity of different bird species. Speech, per se, similar to the vocalizations of many other species, may simply enrich the acoustic environment so that the effects observed in the present study may be transient without significant long-term consequences.

      We note that we already included a noise parameter (in addition to human speech) in the original manuscript. Following the reviewer’s comment, we examined another factor, namely we replaced the previous ambient noise parameter with an estimate of ambient noise under 1kHz which should reflect most anthropogenic noise (not restricted to human speech). This model gave very similar results to the previous one (which is not very surprising as noise is usually correlated). We added this information to the revised manuscript, and we now also added examples of anthropogenic noise to the supplementary materials (Fig. S8). In general, we accept the comments made by the reviewer, but would like to emphasize that we only analyze high SNR vocalization (and not vocalizations that were close to the noise level). This strategy should have overcome biases that resulted from slight changes in ambient noise.

      In general, the authors achieved their aim of illustrating the complexity of the effect of human activity on animal behavior. At the same time, their study also made it clear that estimating such effects is not simple given the dynamics of animal behavior. For example, seasonality, temperature changes, animal migration and movement, as well as interspecies interactions, such as related to predator-prey behavior, and inter/intra-species competition in other respects can all play into site-specific changes in the vocal activity of a particular species.

      We completely agree and tried to further emphasize this in the revised manuscript. This is one of the main conclusions of this study – we should be careful when reaching conclusions.

      Although the methods used in the present study are statistically rigorous, a multivariate approach and visualization techniques afforded by principal components analysis and multidimensional scaling methods may be more effective in communicating the overall results.

      Following this comment, we ran a discriminant function analysis with the parameters of the best model (site category, ambient noise, human activity, temperature and lockdown state) with the task of classifying the level of bird activity. The DFA analysis managed to classify activity significantly above chance and the weights of the parameters revealed some insight about their relative importance. We added this information to the revised manuscript

      Suggestions for improvement:

      In Figure 2, the labeling of the Y-axis in the right panel should be moved to the left, similar to A and C. This will provide clear separation between the two side-to-side panels.

      Revised

      In Figure 3, it will be good to see the regression lines (as dashed lines) separately for the lockdown and no-lockdown conditions in addition to the overall effect.

      Revised

      Editor:

      Limitations

      Scale: The study's limited spatial and temporal scale was not addressed by the authors, which contrasts with the broader scope of other cited studies. To enhance the significance of the study, acknowledging and clearly highlighting this limitation, along with its potential caveats, modifications in the language used throughout the text would be beneficial. Furthermore, although the authors examined slight variations in habitat, it is important to note that all sites were primarily located within an urban landscape.

      We revised the manuscript accordingly.

      Control period: The control period is significantly shorter than the lockdown treatment period and occurs at a different time of year, potentially impacting the vocalization patterns of birds due to different annual cycle stages. It is crucial to consider that the control period falls within the pandemic timeframe despite being shortly after the lockdowns ended.

      Revised – we included a control comparison to periods of equal length within the lockdown. People gradually stopped obeying the lockdown regulations before its removal so in fact, the official removal date is probably an overestimate for the effect of the lockdown. We now explain this.

      Recommendations

      Human-generated noise, beyond speech, might have a greater influence on the acoustic activity of various bird species, but previous studies lacked detailed human activity data. Instead of solely noting the number of human talkers, the authors could quantify other aspects of human activity such as vehicles or overall anthropogenic noise volume. Exploring the relationships between these factors and bird activity at a fine scale, while disentangling them from bird detection, would be compelling. It is important to consider the potential difficulty in resolving other anthropogenic sounds within a specific bandwidth, which could be demonstrated to readers through spectrograms and potential post-pandemic changes. Such information, including daily coefficient of variation/fluctuation rather than absolute frequency spectra, could provide valuable insights.

      We note that we have already included an ambient noise factor (in addition to human speech) in the previous version. Following the reviewers’ comments, we examined another factor, namely we replaced the current ambient noise parameter with the ambient noise under 1kHz which should reflect most of anthropogenic noise (not restricted to human speech). This model gave very similar results to the previous one (which is not surprising as noise is usually correlated). We also added several spectrograms in the Supplementary material that show examples of different types of noise.

      Authors should limit their data interpretation to the impact of lockdown on behavioral responses within small-scale variations in habitat. A key critique is the assumption that activity changes solely resulted from the lockdown, disregarding other environmental factors and phenology.

      Following the editor comment we realized that our conclusion\assertations were not clear. We never claimed that activity changes solely resulted from the lockdown. While revsing the mansucirpt we ensurred that we show a significant effect of temperature, ambient noise and human activity – all of which are not dependent on lockdown. We made an effort to emphasize the complexity of the system. We show that the lockdown seemed to have an additional impact, but we never claimed it was the only factor.

      To address this, the authors could compare acoustic monitoring data within a shorter timeframe before and after the lockdown (20 days), while also controlling for temperature effects, to strengthen the validity of their claims. They would need to explain in their discussion, however, that such a comparison may still be confounded by any carry-over effects from the 10 days of treatment.

      This analysis would be difficult because although the lockdown was officially removed at a specific date, it was gradually less respected by the citizens and thus the last period of the lockdown was somewhere between lockdown and no-lockdown. This is why we chose the approach of taking 10 days randomly from within the lockdown period and comparing them with the 10 post-lockdown days. We now clarify the reason better.

      An option is that authors could frame their analysis as a study of the behavior of wildlife coming out of a lockdown, to draw a distinction from other studies that compared pre-pandemic data to pandemic data.

      Good idea – revised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank all three Reviewers for their comments and have revised the manuscript accordingly.

      Reviewer #1 (Public Review):

      The main objective of this paper is to report the development of a new intramuscular probe that the authors have named Myomatrix arrays. The goal of the Myomatrix probe is to significantly advance the current technological ability to record the motor output of the nervous system, namely fine-wire electromyography (EMG). Myomatrix arrays aim to provide large-scale recordings of multiple motor units in awake animals under dynamic conditions without undue movement artifacts and maintain long-term stability of chronically implanted probes. Animal motor behavior occurs through muscle contraction, and the ultimate neural output in vertebrates is at the scale of motor units, which are bundles of muscle fibers (muscle cells) that are innervated by a single motor neuron. The authors have combined multiple advanced manufacturing techniques, including lithography, to fabricate large and dense electrode arrays with mechanical features such as barbs and suture methods that would stabilize the probe's location within the muscle without creating undue wiring burden or tissue trauma. Importantly, the fabrication process they have developed allows for rapid iteration from design conception to a physical device, which allows for design optimization of the probes for specific muscle locations and organisms. The electrical output of these arrays is processed through a variety of means to try to identify single motor unit activity. At the simplest, the approach is to use thresholds to identify motor unit activity. Of intermediate data analysis complexity is the use of principal component analysis (PCA, a linear second-order regression technique) to disambiguate individual motor units from the wide field recordings of the arrays, which benefits from the density and numerous recording electrodes. At the highest complexity, they use spike sorting techniques that were developed for Neuropixels, a large-scale electrophysiology probe for cortical neural recordings. Specifically, they use an estimation code called kilosort, which ultimately relies on clustering techniques to separate the multi-electrode recordings into individual spike waveforms.

      The biggest strength of this work is the design and implementation of the hardware technology. It is undoubtedly a major leap forward in our ability to record the electrical activity of motor units. The myomatrix arrays trounce fine-wire EMGs when it comes to the quality of recordings, the number of simultaneous channels that can be recorded, their long-term stability, and resistance to movement artifacts.

      The primary weakness of this work is its reliance on kilosort in circumstances where most of the channels end up picking up the signal from multiple motor units. As the authors quite convincingly show, this setting is a major weakness for fine-wire EMG. They argue that the myomatrix array succeeds in isolating individual motor unit waveforms even in that challenging setting through the application of kilosort.

      Although the authors call the estimated signals as well-isolated waveforms, there is no independent evidence of the accuracy of the spike sorting algorithm. The additional step (spike sorting algorithms like kilosort) to estimate individual motor unit spikes is the part of the work in question. Although the estimation algorithms may be standard practice, the large number of heuristic parameters associated with the estimation procedure are currently tuned for cortical recordings to estimate neural spikes. Even within the limited context of Neuropixels, for which kilosort has been extensively tested, basic questions like issues of observability, linear or nonlinear, remain open. By observability, I mean in the mathematical sense of well-posedness or conditioning of the inverse problem of estimating single motor unit spikes given multi-channel recordings of the summation of multiple motor units. This disambiguation is not always possible. kilosort's validation relies on a forward simulation of the spike field generation, which is then truth-tested against the sorting algorithm. The empirical evidence is that kilosort does better than other algorithms for the test simulations that were performed in the context of cortical recordings using the Neuropixels probe. But this work has adopted kilosort without comparable truth-tests to build some confidence in the application of kilosort with myomatrix arrays.

      Kilosort was developed to analyze spikes from neurons rather than motor units and, as Reviewer #1 correctly points out, despite a number of prior validation studies the conditions under which Kilosort accurately identifies individual neurons are still incompletely understood. Our application of Kilosort to motor unit data therefore demands that we explain which of Kilosort’s assumptions do and do not hold for motor unit data and explain how our modifications of the Kilosort pipeline to account for important differences between neural and muscle recording, which we summarize below and have included in the revised manuscript.

      Additionally, both here and in the revised paper we emphasize that while the presented spike sorting methods (thresholding, PCA-based clustering, and Kilosort) robustly extract motor unit waveforms, spike sorting of motor units is still an ongoing project. Our future work will further elaborate how differences between cortical and motor unit data should inform approaches to spike sorting as well as develop simulated motor unit datasets that can be used to benchmark spike sorting methods.

      For our current revision, we have added detailed discussion (see “Data analysis: spike sorting”) of the risks and benefits of our use of Kilosort to analyze motor unit data, in each case clarifying how we have modified the Kilosort code with these issues in mind:

      “Modification of spatial masking: Individual motor units contain multiple muscle fibers (each of which is typically larger than a neuron’s soma), and motor unit waveforms can often be recorded across spatially distant electrode contacts as the waveforms propagate along muscle fibers. In contrast, Kilosort - optimized for the much more local signals recorded from neurons - uses spatial masking to penalize templates that are spread widely across the electrode array. Our modifications to Kilosort therefore include ensuring that Kilosort search for motor unit templates across all (and only) the electrode channels inserted into a given muscle. In this Github repository linked above, this is accomplished by setting parameter nops.sigmaMask to infinity, which effectively eliminates spatial masking in the analysis of the 32 unipolar channels recorded from the injectable Myomatrix array schematized in Supplemental Figure 1g. In cases including chronic recording from mice where only a single 8-contact thread is inserted into each muscle, a similar modification can be achieved with a finite value of nops.sigmaMask by setting parameter NchanNear, which represents the number of nearby EMG channels to be included in each cluster, to equal the number of unipolar or bipolar data channels recorded from each thread. Finally, note that in all cases Kilosort parameter NchanNearUp (which defines the maximum number of channels across which spike templates can appear) must be reset to be equal to or less than the total number of Myomatrix data channels.”

      “Allowing more complex spike waveforms: We also modified Kilosort to account for the greater duration and complexity (relative to neural spikes) of many motor unit waveforms. In the code repository linked above, Kilosort 2.5 was modified to allow longer spike templates (151 samples instead of 61), more spatiotemporal PCs for spikes (12 instead of 6), and more left/right eigenvector pairs for spike template construction (6 pairs instead of 3). These modifications were crucial for improving sorting performance in the nonhuman primate dataset shown in Figure 3, and in a subset of the rodent datasets (although they were not used in the analysis of mouse data shown in Fig. 1 and Supplemental Fig. 2a-f).”

      Furthermore, as the paper on the latest version of kilosort, namely v4, discusses, differences in the clustering algorithm is the likely reason for kilosort4 performing more robustly than kilosort2.5 (used in the myomatrix paper). Given such dependence on details of the implementation and the use of an older kilosort version in this paper, the evidence that the myomatrix arrays truly record individual motor units under all the types of data obtained is under question.

      We chose to modify Kilosort 2.5, which has been used by many research groups to sort spike features, rather than the just-released Kilosort 4.0. Although future studies might directly compare the performance of these two versions on sorting motor unit data, we feel that such an analysis is beyond the scope of this paper, which aims primarily to introduce our electrode technology and demonstrate that a wide range of sorting methods (thresholding, PCA-based waveform clustering, and Kilosort) can all be used to extract single motor units. Additionally, note that because we have made several significant modifications to Kilosort 2.5 as described above, it is not clear what a “direct” comparison between different Kilosort versions would mean, since the procedures we provide here are no longer identical to version 2.5.

      There is an older paper with a similar goal to use multi-channel recording to perform sourcelocalization that the authors have failed to discuss. Given the striking similarity of goals and the divergence of approaches (the older paper uses a surface electrode array), it is important to know the relationship of the myomatrix array to the previous work. Like myomatrix arrays, the previous work also derives inspiration from cortical recordings, in that case it uses the approach of source localization in large-scale EEG recordings using skull caps, but applies it to surface EMG arrays. Ref: van den Doel, K., Ascher, U. M., & Pai, D. K. (2008). Computed myography: three-dimensional reconstruction of motor functions from surface EMG data. Inverse Problems, 24(6), 065010.

      We thank the Reviewer for pointing out this important prior work, which we now cite and discuss in the revised manuscript under “Data analysis: spike sorting” [lines 318-333]:

      “Our approach to spike sorting shares the same ultimate goal as prior work using skin-surface electrode arrays to isolate signals from individual motor units but pursues this goal using different hardware and analysis approaches. A number of groups have developed algorithms for reconstructing the spatial location and spike times of active motor units (Negro et al. 2016; van den Doel, Ascher, and Pai 2008) based on skin-surface recordings, in many cases drawing inspiration from earlier efforts to localize cortical activity using EEG recordings from the scalp (Michel et al. 2004). Our approach differs substantially. In Myomatrix arrays, the close electrode spacing and very close proximity of the contacts to muscle fibers ensure that each Myomatrix channel records from a much smaller volume of tissue than skin-surface arrays. This difference in recording volume in turn creates different challenges for motor unit isolation: compared to skin-surface recordings, Myomatrix recordings include a smaller number of motor units represented on each recording channel, with individual motor units appearing on a smaller fraction of the sensors than typical in a skin-surface recording. Because of this sensordependent difference in motor unit source mixing, different analysis approaches are required for each type of dataset. Specifically, skin-surface EMG analysis methods typically use source-separation approaches that assume that each sensor receives input from most or all of the individual sources within the muscle as is presumably the case in the data. In contrast, the much sparser recordings from Myomatrix are better decomposed using methods like Kilosort, which are designed to extract waveforms that appear only on a small, spatially-restricted subset of recording channels.”

      The incompleteness of the evidence that the myomatrix array truly measures individual motor units is limited to the setting where multiple motor units have similar magnitude of signal in most of the channels. In the simpler data setting where one motor dominates in some channel (this seems to occur with some regularity), the myomatrix array is a major advance in our ability to understand the motor output of the nervous system. The paper is a trove of innovations in manufacturing technique, array design, suture and other fixation devices for long-term signal stability, and customization for different muscle sizes, locations, and organisms. The technology presented here is likely to achieve rapid adoption in multiple groups that study motor behavior, and would probably lead to new insights into the spatiotemporal distribution of the motor output under more naturally behaving animals than is the current state of the field.

      We thank the Reviewer for this positive evaluation and for the critical comments above.

      Reviewer #2 (Public Review):

      Motoneurons constitute the final common pathway linking central impulse traffic to behavior, and neurophysiology faces an urgent need for methods to record their activity at high resolution and scale in intact animals during natural movement. In this consortium manuscript, Chung et al. introduce highdensity electrode arrays on a flexible substrate that can be implanted into muscle, enabling the isolation of multiple motor units during movement. They then demonstrate these arrays can produce high-quality recordings in a wide range of species, muscles, and tasks. The methods are explained clearly, and the claims are justified by the data. While technical details on the arrays have been published previously, the main significance of this manuscript is the application of this new technology to different muscles and animal species during naturalistic behaviors. Overall, we feel the manuscript will be of significant interest to researchers in motor systems and muscle physiology, and we have no major concerns. A few minor suggestions for improving the manuscript follow.

      We thank the Reviewer for this positive overall assessment.

      The authors perhaps understate what has been achieved with classical methods. To further clarify the novelty of this study, they should survey previous approaches for recording from motor units during active movement. For example, Pflüger & Burrows (J. Exp. Biol. 1978) recorded from motor units in the tibial muscles of locusts during jumping, kicking, and swimming. In humans, Grimby (J. Physiol. 1984) recorded from motor units in toe extensors during walking, though these experiments were most successful in reinnervated units following a lesion. In addition, the authors might briefly mention previous approaches for recording directly from motoneurons in awake animals (e.g., Robinson, J. Neurophys. 1970; Hoffer et al., Science 1981).

      We agree and have revised the manuscript to discuss these and other prior use of traditional EMG, including here [lines 164-167]:

      “The diversity of applications presented here demonstrates that Myomatrix arrays can obtain highresolution EMG recordings across muscle groups, species, and experimental conditions including spontaneous behavior, reflexive movements, and stimulation-evoked muscle contractions. Although this resolution has previously been achieved in moving subjects by directly recording from motor neuron cell bodies in vertebrates (Hoffer et al. 1981; Robinson 1970; Hyngstrom et al. 2007) and by using fine-wire electrodes in moving insects (Pfluger 1978; Putney et al. 2023), both methods are extremely challenging and can only target a small subset of species and motor unit populations. Exploring additional muscle groups and model systems with Myomatrix arrays will allow new lines of investigation into how the nervous system executes skilled behaviors and coordinates the populations of motor units both within and across individual muscles…

      For chronic preparations, additional data and discussion of the signal quality over time would be useful. Can units typically be discriminated for a day or two, a week or two, or longer?

      A related issue is whether the same units can be tracked over multiple sessions and days; this will be of particular significance for studies of adaptation and learning.

      Although the yields of single units are greatest in the 1-2 weeks immediately following implantation, in chronic preparations we have obtained well-isolated single units up to 65 days post-implant. Anecdotally, in our chronic mouse implants we occasionally see motor units on the same channel across multiple days with similar waveform shapes and patterns of behavior-locked activity. However, because data collection for this manuscript was not optimized to answer this question, we are unable to verify whether these observations actually reflect cross-session tracking of individual motor units. For example, in all cases animals were disconnected from data collection hardware in between recording sessions (which were often separated by multiple intervening days) preventing us from continuously tracking motor units across long timescales. We agree with the reviewer that long-term motor unit tracking would be extremely useful as a tool for examining learning and plan to address this question in future studies.

      We have added a discussion of these issues to the revised manuscript [lines 52-59]:

      “…These methods allow the user to record simultaneously from ensembles of single motor units (Fig. 1c,d) in freely behaving animals, even from small muscles including the lateral head of the triceps muscle in mice (approximately 9 mm in length with a mass of 0.02 g 23). Myomatrix recordings isolated single motor units for extended periods (greater than two months, Supp. Fig. 3e), although highest unit yield was typically observed in the first 1-2 weeks after chronic implantation. Because recording sessions from individual animals were often separated by several days during which animals were disconnected from data collection equipment, we are unable to assess based on the present data whether the same motor units can be recorded over multiple days.”

      Moreover, we have revised Supplemental Figure 3 to show an example of single motor units recorded >2 months after implantation:

      Author response image 1.

      Longevity of Myomatrix recordings In addition to isolating individual motor units, Myomatrix arrays also provide stable multi-unit recordings of comparable or superior quality to conventional fine wire EMG…. (e) Although individual motor units were most frequently recorded in the first two weeks of chronic recordings (see main text), Myomatrix arrays also isolate individual motor units after much longer periods of chronic implantation, as shown here where spikes from two individual motor units (colored boxes in bottom trace) were isolated during locomotion 65 days after implantation. This bipolar recording was collected from the subject plotted with unfilled black symbols in panel (d).

      It appears both single-ended and differential amplification were used. The authors should clarify in the Methods which mode was used in each figure panel, and should discuss the advantages and disadvantages of each in terms of SNR, stability, and yield, along with any other practical considerations.

      We thank the reviewer for the suggestion and have added text to all figure legends clarifying whether each recording was unipolar or bipolar.

      Is there likely to be a motor unit size bias based on muscle depth, pennation angle, etc.?

      Although such biases are certainly possible, the data presented here are not well-suited to answering these questions. For chronic implants in small animals, the target muscles (e.g. triceps in mice) are so small that the surgeon often has little choice about the site and angle of array insertion, preventing a systematic analysis of this question. For acute array injections in larger animals such as rhesus macaques, we did not quantify the precise orientation of the arrays (e.g. with ultrasound imaging) or the muscle fibers themselves, again preventing us from drawing strong conclusions on this topic. This question is likely best addressed in acute experiments performed on larger muscles, in which the relative orientations of array threads and muscle fibers can be precisely imaged and systematically varied to address this important issue.

      Can muscle fiber conduction velocity be estimated with the arrays?

      We sometimes observe fiber conduction delays up to 0.5 msec as the spike from a single motor unit moves from electrode contact to electrode contact, so spike velocity could be easily estimated given the known spatial separation between electrode contacts. However (closely related to the above question) this will only provide an accurate estimate of muscle fiber conduction velocity if the electrode contacts are arranged parallel to fiber direction, which is difficult to assess in our current dataset. If the arrays are not parallel, this computation will produce an overestimate of conduction velocity, as in the extreme case where a line of electrode contacts arranged perpendicular to the fiber direction might have identical spike arrival times, and therefore appear to have an infinite conduction velocity. Therefore, although Myomatrix arrays can certainly be used to estimate conduction velocity, such estimates should be performed in future studies only in settings where the relative orientation of array threads and muscle fibers can be accurately measured.

      The authors suggest their device may have applications in the diagnosis of motor pathologies. Currently, concentric needle EMG to record from multiple motor units is the standard clinical method, and they may wish to elaborate on how surgical implantation of the new array might provide additional information for diagnosis while minimizing risk to patients.

      We thank the reviewer for the suggestion and have modified the manuscript’s final paragraph accordingly [lines 182-188]:

      “Applying Myomatrix technology to human motor unit recordings, particularly by using the minimally invasive injectable designs shown in Figure 3 and Supplemental Figure 1g,i, will create novel opportunities to diagnose motor pathologies and quantify the effects of therapeutic interventions in restoring motor function. Moreover, because Myomatrix arrays are far more flexible than the rigid needles commonly used to record clinical EMG, our technology might significantly reduce the risk and discomfort of such procedures while also greatly increasing the accuracy with which human motor function can be quantified. This expansion of access to high-resolution EMG signals – across muscles, species, and behaviors – is the chief impact of the Myomatrix project.”

      Reviewer #3 (Public Review):

      This work provides a novel design of implantable and high-density EMG electrodes to study muscle physiology and neuromotor control at the level of individual motor units. Current methods of recording EMG using intramuscular fine-wire electrodes do not allow for isolation of motor units and are limited by the muscle size and the type of behavior used in the study. The authors of Myomatrix arrays had set out to overcome these challenges in EMG recording and provided compelling evidence to support the usefulness of the new technology.

      Strengths:

      They presented convincing examples of EMG recordings with high signal quality using this new technology from a wide array of animal species, muscles, and behavior.

      • The design included suture holes and pull-on tabs that facilitate implantation and ensure stable recordings over months.

      • Clear presentation of specifics of the fabrication and implantation, recording methods used, and data analysis.

      We thank the Reviewer for these comments.

      Weaknesses:

      The justification for the need to study the activity of isolated motor units is underdeveloped. The study could be strengthened by providing example recordings from studies that try to answer questions where isolation of motor unit activity is most critical. For example, there is immense value for understanding muscles with smaller innervation ratio which tend to have many motor neurons for fine control of eyes and hand muscles.

      We thank the Reviewer for the suggestion and have modified the manuscript accordingly [lines 170-174]:

      “…how the nervous system executes skilled behaviors and coordinates the populations of motor units both within and across individual muscles. These approaches will be particularly valuable in muscles in which each motor neuron controls a very small number of muscle fibers, allowing fine control of oculomotor muscles in mammals as well as vocal muscles in songbirds (Fig. 2g), in which most individual motor neurons innervate only 1-3 muscle fibers (Adam et al. 2021).”

      Reviewer #1 (Recommendations for The Authors):

      I would urge the authors to consider a thorough validation of the spike sorting piece of the workflow. Barring that weakness, this paper has the potential to transform motor neuroscience. The validation efforts of kilosort in the context of Neuropixels might offer a template for how to convince the community of the accuracy of myomatrix arrays in disambiguating individual motor unit waveforms.

      I have a few minor detailed comments, that the authors may find of some use. My overall comment is to commend the authors for the precision of the work as well as the writing. However, exercising caution associated with kilosort could truly elevate the paper by showing where there is room for improvement.

      We thank the Reviewer for these comments - please see our summary of our revisions related to Kilosort in our reply to the public reviews above.

      L6-7: The relationship between motor unit action potential and the force produced is quite complicated in muscle. For example, recent work has shown how decoupled the force and EMG can be during nonsteady locomotion. Therefore, it is not a fully justified claim that recording motor unit potentials will tell us what forces are produced. This point relates to another claim made by the authors (correctly) that EMG provides better quality information about muscle motor output in isometric settings than in more dynamic behaviors. That same problem could also apply to motor unit recordings and their relationship to muscle force. The relationship is undoubtedly strong in an isometric setting. But as has been repeatedly established, the electrical activity of muscle is only loosely related to its force output and lacks in predictive power.

      This is an excellent point, and our revised manuscript now addresses this issue [lines 174-176]:

      “…Of further interest will be combining high-resolution EMG with precise measurement of muscle length and force output to untangle the complex relationship between neural control, body kinematics, and muscle force that characterizes dynamic motor behavior. Similarly, combining Myomatrix recordings with high-density brain recordings….”

      L12: There is older work that uses an array of skin mounted EMG electrodes to solve a source location problem, and thus come quite close to the authors' stated goals. However, the authors have failed to cite or provide an in-depth analysis and discussion of this older work.

      As described above in the response to Reviewer 1’s public review comments, we now cite and discuss these papers.

      L18-19: "These limitations have impeded our understanding of fundamental questions in motor control, ..." There are two independently true statements here. First is that there are limitations to EMG based inference of motor unit activity. Second is that there are gaps in the current understanding of motor unit recruitment patterns and modification of these patterns during motor learning. But the way the first few paragraphs have been worded makes it seem like motor unit recordings is a panacea for these gaps in our knowledge. That is not the case for many reasons, including key gaps in our understanding of how muscle's electrical activity relates to its force, how force relates to movement, and how control goals map to specific movement patterns. This manuscript would in fact be strengthened by acknowledging and discussing the broader scope of gaps in our understanding, and thus more precisely pinpointing the specific scientific knowledge that would be gained from the application of myomatrix arrays.

      We agree and have revised the manuscript to note this complexity (see our reply to this Reviewer’s other comment about muscle force, above).

      L140-143: The estimation algorithms yields potential spikes but lacking the validation of the sorting algorithms, it is not justifiable to conclude that the myomatrix arrays have already provided information about individual motor units.

      Please see our replies to Reviewer #1s public comments (above) regarding motor unit spike sorting.

      L181-182: "These methods allow very fine pitch escape routing (<10 µm spacing), alignment between layers, and uniform via formation." I find this sentence hard to understand. Perhaps there is some grammatical ambiguity?

      We have revised this passage as follows [lines 194-197]:

      "These methods allow very fine pitch escape routing (<10 µm spacing between the thin “escape” traces connecting electrode contacts to the connector), spatial alignment between the multiple layers of polyimide and gold that constitute each device, and precise definition of “via” pathways that connect different layers of the device.”

      L240: What is the rationale for choosing this frequency band for the filter?

      Individual motor unit waveforms have peak energy at roughly 0.5-2.0 kHz, although units recorded at very high SNR often have voltage waveform features at higher frequencies. The high- and lowpass cutoff frequencies should reflect this, although there is nothing unique about the 350 Hz and 7,000 Hz cutoffs we describe, and in all recordings similar results can be obtained with other choices of low/high frequency cutoffs.

      L527-528: There are some key differences between the electrode array design presented here and traditional fine-wire EMG in terms of features used to help with electrode stability within the muscle. A barb-like structure is formed in traditional fine-wire EMG by bending the wire outside the canula of the needle used to place it within the muscle. But when the wire is pulled out, it is common for the barb to break off and be left behind. This is because of the extreme (thin) aspect ratio of the barb in fine wire EMG and low-cycle fatigue fracture of the wire. From the schematic shown here, the barb design seems to be stubbier and thus less prone to breaking off. This raises the question of how much damage is inflicted during the pull-out and the associated level of discomfort to the animal as a result. The authors should present a more careful statement and documentation with regard to this issue.

      We have updated the manuscript to highlight the ease of inserting and removing Myomatrix probes, and to clarify that in over 100 injectable insertions/removal there have been zero cases of barbs (or any other part) of the devices breaking off within the muscle [lines 241-249]:

      “…Once the cannula was fully inserted, the tail was released, and the cannula slowly removed. After recording, the electrode and tail were slowly pulled out of the muscle together. Insertion and removal of injectable Myomatrix devices appeared to be comparable or superior to traditional fine-wire EMG electrodes (in which a “hook” is formed by bending back the uninsulated tip of the recording wire) in terms of both ease of injection, ease of removal of both the cannula and the array itself, and animal comfort. Moreover, in over 100 Myomatrix injections performed in rhesus macaques, there were zero cases in which Myomatrix arrays broke such that electrode material was left behind in the recorded muscle, representing a substantial improvement over traditional fine-wire approaches, in which breakage of the bent wire tip regularly occurs (Loeb and Gans 1986).”

      Reviewer #2 (Recommendations For The Authors):

      The Abstract states the device records "muscle activity at cellular resolution," which could potentially be read as a claim that single-fiber recording has been achieved. The authors might consider rewording.

      The Reviewer is correct, and we have removed the word “cellular”.

      The supplemental figures could perhaps be moved to the main text to aid readers who prefer to print the combined PDF file.

      After finalizing the paper we will upload all main-text and supplemental figures into a single pdf on biorXiv for readers who prefer a single pdf. However, given that the supplemental figures provide more technical and detailed information than the main-text figures, for the paper on the eLife site we prefer the current eLife format in which supplemental figures are associated with individual main-text figures online.

      Reviewer #3 (Recommendations For The Authors):

      • The work could be strengthened by showing examples of simultaneous recordings from different muscles.

      Although Myomatrix arrays can indeed be used to record simultaneously from multiple muscles, in this manuscript we have decided to focus on high-resolution recordings that maximize the number of recording channels and motor units obtained from a single muscle. Future work from our group with introduce larger Myomatrix arrays optimized for recording from many muscles simultaneously.

      • The implantation did not include mention of testing the myomatrix array during surgery by using muscle stimulation to verify correct placement and connection.

      As the Reviewer points out electrical stimulation is a valuable tool for confirming successful EMG placement. However we did not use this approach in the current study, relying instead on anatomical confirmation of muscle targeting (e.g. intrasurgical and postmortem inspection in rodents) and by implanting large, easy-totarget arm muscles (in primates) where the risk of mis-targeting is extremely low. Future studies will examine both electrical stimulation and ultrasound methods for confirming the placement of Myomatrix arrays.

      References cited above

      Adam, I., A. Maxwell, H. Rossler, E. B. Hansen, M. Vellema, J. Brewer, and C. P. H. Elemans. 2021. 'One-to-one innervation of vocal muscles allows precise control of birdsong', Curr Biol, 31: 3115-24 e5.

      Hoffer, J. A., M. J. O'Donovan, C. A. Pratt, and G. E. Loeb. 1981. 'Discharge patterns of hindlimb motoneurons during normal cat locomotion', Science, 213: 466-7.

      Hyngstrom, A. S., M. D. Johnson, J. F. Miller, and C. J. Heckman. 2007. 'Intrinsic electrical properties of spinal motoneurons vary with joint angle', Nat Neurosci, 10: 363-9.

      Loeb, G. E., and C. Gans. 1986. Electromyography for Experimentalists, First edi (The University of Chicago Press: Chicago, IL).

      Michel, C. M., M. M. Murray, G. Lantz, S. Gonzalez, L. Spinelli, and R. Grave de Peralta. 2004. 'EEG source imaging', Clin Neurophysiol, 115: 2195-222.

      Negro, F., S. Muceli, A. M. Castronovo, A. Holobar, and D. Farina. 2016. 'Multi-channel intramuscular and surface EMG decomposition by convolutive blind source separation', J Neural Eng, 13: 026027.

      Pfluger, H. J.; Burrows, M. 1978. 'Locusts use the same basic motor pattern in swimming as in jumping and kicking', Journal of experimental biology, 75: 81-93.

      Putney, Joy, Tobias Niebur, Leo Wood, Rachel Conn, and Simon Sponberg. 2023. 'An information theoretic method to resolve millisecond-scale spike timing precision in a comprehensive motor program', PLOS Computational Biology, 19: e1011170.

      Robinson, D. A. 1970. 'Oculomotor unit behavior in the monkey', J Neurophysiol, 33: 393-403.

      van den Doel, Kees, Uri M Ascher, and Dinesh K Pai. 2008. 'Computed myography: three-dimensional reconstruction of motor functions from surface EMG data', Inverse Problems, 24: 065010.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Firstly, we must take a moment to express our sincere gratitude to editorial board for allowing this work to be reviewed, and to the peer reviewers for taking the time and effort to review our manuscript. The reviews are thoughtful and reflect the careful work of scientists who undoubtedly have many things on their schedule. We cannot express our gratitude enough. This is not a minor sentiment. We appreciate the engagement.

      Allow us to briefly highlight some of the changes made to the revised manuscript, most on behalf of suggestions made by the reviewers:

      1) A supplementary figure that includes the calculation of drug applicability and variant vulnerability for a different data set–16 alleles of dihydrofolate reductase, and two antifolate compounds used to treat malaria–pyrimethamine and cycloguanil.

      2) New supplementary figures that add depth to the result in Figure 1 (the fitness graphs): we demonstrate how the rank order of alleles changes across drug environments and offer a statistical comparison of the equivalence of these fitness landscapes.

      3) A new subsection that explains our specific method used to measure epistasis.

      4) Improved main text with clarifications, fixed errors, and other addendums.

      5) Improved referencing and citations, in the spirit of better scholarship (now with over 70 references).

      Next, we’ll offer some general comments that we believe apply to several of the reviews, and to the eLife assessment. We have provided the bulk of the responses in some general comments, and in response to the public reviews. We have also included the suggestions and made brief comments to some of the individual recommendations.

      On the completeness of our analysis

      In our response, we’ll address the completeness issue first, as iterations of it appear in several of the reviews, and it seems to be one of the most substantive philosophical critiques of the work (there are virtually no technical corrections, outside of a formatting and grammar fixes, which we are grateful to the reviewers for identifying).

      To begin our response, we will relay that we have now included an analysis of a data set corresponding to mutants of a protein, dihydrofolate reductase (DHFR), from Plasmodium falciparum (a main cause of malaria), across two antifolate drugs (pyrimethamine and ycloguanil). We have also decided to include this new analysis in the supplementary material (see Figure S4).

      Author response image 1.

      Drug applicability and variant vulnerability for 16 alleles of dihydrofolate reductase.

      Here we compute the variant vulnerability and drug applicability metrics for two drugs, pyrimethamine (PYR) and cycloguanil (CYC), both antifolate drugs used to treat malaria. This is a completely different system than the one that is the focus of the submitted paper, for a different biomedical problem (antimalarial resistance), using different drugs, and targets. Further, the new data provide information on both drugs of different kinds, and drug concentrations (as suggested by Reviewer #1; we’ve also added a note about this in the new supplementary material). Note that these data have already been the subject of detailed analyses of epistatic effects, and so we did not include those here, but we do offer that reference:

      ● Ogbunugafor CB. The mutation effect reaction norm (mu-rn) highlights environmentally dependent mutation effects and epistatic interactions. Evolution. 2022 Feb 1;76(s1):37-48.

      ● Diaz-Colunga J, Sanchez A, Ogbunugafor CB. Environmental modulation of global epistasis is governed by effective genetic interactions. bioRxiv. 2022:202211.

      Computing our proposed metrics across different drugs is relatively simple, and we could have populated our paper with suites of similar analyses across data sets of various kinds. Such a paper would, in our view, be spread too thin–the evolution of antifolate resistance and/or antimalarial resistance are enormous problems, with large literatures that warrant focused studies. More generally, as the reviewers doubtlessly understand, simply analyzing more data sets does not make a study stronger, especially one like ours, that is using empirical data to both make a theoretical point about alleles and drugs and offer a metric that others can apply to their own data sets.

      Our approach focused on a data set that allowed us to discuss the biology of a system: a far stronger paper, a far stronger proof-of-concept for a new metric. We will revisit this discussion about the structure of our study. But before doing so, we will elaborate on why the “more is better” tone of the reviews is misguided.

      We also note that study where the data originate (Mira et al. 2015) is focused on a single data set of a single drug-target system. We should also point out that Mira et al. 2015 made a general point about drug concentrations influencing the topography of fitness landscapes, not unlike our general point about metrics used to understand features of alleles and different drugs in antimicrobial systems.

      This isn’t meant to serve as a feeble appeal to authority – just because something happened in one setting doesn’t make it right for another. But other than a nebulous appeal to the fact that things have changed in the 8 years since that study was published, it is difficult to argue why one study system was permissible for other work but is somehow “incomplete” in ours. Double standards can be appropriate when they are justified, but in this case, it hasn’t been made clear, and there is no technical basis for it.

      Our study does what countless other successful ones do: utilizes a biological system to make a general point about some phenomena in the natural world. In our case, we were focused on the need for more evolution-inspired iterations of widely used concepts like druggability. For example, a recent study of epistasis focused on a single set of alleles, across several drugs, not unlike our study:

      ● Lozovsky ER, Daniels RF, Heffernan GD, Jacobus DP, Hartl DL. Relevance of higher-order epistasis in drug resistance. Molecular biology and evolution. 2021 Jan;38(1):142-51.

      Next, we assert that there is a difference between an eagerness to see a new metric applied to many different data sets (a desire we share, and plan on pursuing in the future), and the notion that an analysis is “incomplete” without it. The latter is a more serious charge and suggests that the researcher-authors neglected to properly construct an argument because of gaps in the data. This charge does not apply to our manuscript, at all. And none of the reviewers effectively argued otherwise.

      Our study contains 7 different combinatorially-complete datasets, each composed of 16 alleles (this not including the new analysis of antifolates that now appear in the revision). One can call these datasets “small” or “low-dimensional,” if they choose (we chose to put this front-and-center, in the title). They are, however, both complete and as large or larger than many datasets in similar studies of fitness landscapes:

      ● Knies JL, Cai F, Weinreich DM. Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular biology and evolution. 2017 May 1;34(5):1040-54.

      ● Lozovsky ER, Daniels RF, Heffernan GD, Jacobus DP, Hartl DL. Relevance of higher-order epistasis in drug resistance. Molecular biology and evolution. 2021 Jan;38(1):142-51.

      ● Rodrigues JV, Bershtein S, Li A, Lozovsky ER, Hartl DL, Shakhnovich EI. Biophysical principles predict fitness landscapes of drug resistance. Proceedings of the National Academy of Sciences. 2016 Mar 15;113(11):E1470-8.

      ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.

      ● Lindsey HA, Gallie J, Taylor S, Kerr B. Evolutionary rescue from extinction is contingent on a lower rate of environmental change. Nature. 2013 Feb 28;494(7438):463-7.

      These are only five of very many such studies, some of them very well-regarded.

      Having now gone on about the point about the data being “incomplete,” we’ll next move to the more tangible comment-criticism about the low-dimensionality of the data set, or the fact that we examined a single drug-drug target system (β lactamases, and β-lactam drugs).

      The criticism, as we understand it, is that the authors could have analyzed more data,

      This is a common complaint, that “more is better” in biology. While we appreciate the feedback from the reviewers, we notice that no one specified what constitutes the right amount of data. Some pointed to other single data sets, but would analyzing two different sets qualify as enough? Perhaps to person A, but not to persons B - Z. This is a matter of opinion and is not a rigorous comment on the quality of the science (or completeness of the analysis).

      ● Should we analyze five more drugs of the same target (beta lactamases)? And what bacterial orthologs?

      ● Should we analyze 5 antifolates for 3 different orthologs of dihydrofolate reductase?

      ● And in which species or organism type? Bacteria? Parasitic infections?

      ● And why only infectious disease? Aren’t these concepts also relevant to cancer? (Yes, they are.)

      ● And what about the number of variants in the aforementioned target? Should one aim for small combinatorially complete sets? Or vaster swaths of sequence space, such as the ones generated by deep mutational scanning and other methods?

      I offer these options in part because, for the most part, were not given an objective suggestion for appropriate level of detail. This is because there is no answer to the question of what size of dataset would be most appropriate. Unfortunately, without a technical reason why a data set of unspecified size [X] or [Y] is best, then we are left with a standard “do more work” peer review response, one that the authors are not inclined to engage seriously, because there is no scientific rationale for it.

      The most charitable explanation for why more datasets would be better is tied to the abstract notion that seeing a metric measured in different data sets somehow makes it more believable. This, as the reviewers undoubtedly understand, isn’t necessarily true (in fact, many poor studies mask a lack of clarity with lots of data).

      To double down on this take, we’ll even argue the opposite: that our focus on a single drug system is a strength of the study.

      The focus on a single-drug class allows us to practice the lost art of discussing the peculiar biology of the system that we are examining. Even more, the low dimensionality allows us to discuss–in relative detail–individual mutations and suites of mutations. We do so several times in the manuscript, and even connect our findings to literature that has examined the biophysical consequences of mutations in these very enzymes.

      (For example: Knies JL, Cai F, Weinreich DM. Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular biology and evolution. 2017 May 1;34(5):1040-54.)

      Such detail is only legible in a full-length manuscript because we were able to interrogate a system in good detail. That is, the low-dimensionality (of a complete data set) is a strength, rather than a weakness. This was actually part of the design choice for the study: to offer a new metric with broad application but developed using a system where the particulars could be interrogated and discussed.

      Surely the findings that we recover are engineered for broader application. But to suggest that we need to apply them broadly in order to demonstrate their broad impact is somewhat antithetical to both model systems research and to systems biology, both of which have been successful in extracting general principles for singular (often simple) systems and models.

      An alternative approach, where the metric was wielded across an unspecified number of datasets would lend to a manuscript that is unfocused, reading like many modern machine learning papers, where the analysis or discussion have little to do with actual biology. We very specifically avoided this sort of study.

      To close our comments regarding data: Firstly, we have considered the comments and analyzed a different data set, corresponding to a different drug-target system (antifolate drugs, and DHFR). Moreover, we don’t think more data has anything to do with a better answer or support for our conclusions or any central arguments. Our arguments were developed from the data set that we used but achieve what responsible systems biology does: introduces a framework that one can apply more broadly. And we develop it using a complete, and well-vetted dataset. If the reviewers have a philosophical difference of opinion about this, we respect it, but it has nothing to do with our study being “complete” or not. And it doesn’t speak to the validity of our results.

      Related: On the dependence of our metrics on drug-target system

      Several comments were made that suggest the relevance of the metric may depend on the drug being used. We disagree with this, and in fact, have argued the opposite: the metrics are specifically useful because they are not encumbered with unnecessary variables. They are the product of rather simple arithmetic that is completely agnostic to biological particulars.

      We explain, in the section entitled “Metric Calculations:

      “To estimate the two metrics we are interested in, we must first quantify the susceptibility of an allelic variant to a drug. We define susceptibility as $1 - w$, where w is the mean growth of the allelic variant under drug conditions relative to the mean growth of the wild-type/TEM-1 control. If a variant is not significantly affected by a drug (i.e., growth under drug is not statistically lower than growth of wild-type/TEM-1 control, by t-test P-value < 0.01), its susceptibility is zero. Values in these metrics are summaries of susceptibility: the variant vulnerability of an allelic variant is its average susceptibility across drugs in a panel, and the drug applicability of an antibiotic is the average susceptibility of all variants to it.”

      That is, these can be animated to compute the variant vulnerability and drug applicability for data sets of various kinds. To demonstrate this (and we thank the reviewers for suggesting it), we have analyzed the antifolate-DHFR data set as outlined above.

      Finally, we will make the following light, but somewhat cynical point (that relates to the “more data” more point generally): the wrong metric applied to 100 data sets is little more than 100 wrong analyses. Simply applying the metric to a wide number of datasets has nothing to do with the veracity of the study. Our study, alternatively, chose the opposite approach: used a data set for a focused study where metrics were extracted. We believe this to be a much more rigorous way to introduce new metrics.

      On the Relevance of simulations

      Somewhat relatedly, the eLife summary and one of the reviewers mentioned the potential benefit of simulations. Reviewer 1 correctly highlights that the authors have a lot of experience in this realm, and so generating simulations would be trivial. For example, the authors have been involved in studies such as these:

      ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.

      ● Ogbunugafor CB, Wylie CS, Diakite I, Weinreich DM, Hartl DL. Adaptive landscape by environment interactions dictate evolutionary dynamics in models of drug resistance. PLoS computational biology. 2016 Jan 25;12(1):e1004710.

      ● Ogbunugafor CB, Hartl D. A pivot mutation impedes reverse evolution across an adaptive landscape for drug resistance in Plasmodium vivax. Malaria Journal. 2016 Dec;15:1-0.

      From the above and dozens of other related studies, we’ve learned that simulations are critical for questions about the end results of dynamics across fitness landscapes of varying topography. To simulate across the datasets in the submitted study would be be a small ask. We do not provide this, however, because our study is not about the dynamics of de novo evolution of resistance. In fact, our study focuses on a different problem, no less important for understanding how resistance evolves: determining static properties of alleles and drugs, that provide a picture into their ability to withstand a breadth of drugs in a panel (variant vulnerability), or the ability of a drug in a panel to affect a breadth of drug targets.

      The authors speak on this in the Introduction:

      “While stepwise, de novo evolution (via mutations and subsequent selection) is a key force in the evolution of antimicrobial resistance, evolution in natural settings often involves other processes, including horizontal gene transfer and selection on standing genetic variation. Consequently, perspectives that consider variation in pathogens (and their drug targets) are important for understanding treatment at the bedside. Recent studies have made important strides in this arena. Some have utilized large data sets and population genetics theory to measure cross-resistance and collateral sensitivity. Fewer studies have made use of evolutionary concepts to establish metrics that apply to the general problem of antimicrobial treatment on standing genetic variation in pathogen populations, or for evaluating the utility of certain drugs’ ability to treat the underlying genetic diversity of pathogens”

      That is, the proposed metrics aren’t about the dynamics of stepwise evolution across fitness landscapes, and so, simulating those dynamics don’t offer much for our question. What we have done instead is much more direct and allows the reader to follow a logic: clearly demonstrate the topography differences in Figure 1 (And Supplemental Figure S2 and S3 with rank order changes).

      Author response image 2.

      These results tell the reader what they need to know: that the topography of fitness landscapes changes across drug types. Further, we should note that Mira et al. 2015 already told the basic story that one finds different adaptive solutions across different drug environments. (Notably, without computational simulations).

      In summary, we attempted to provide a rigorous, clean, and readable study that introduced two new metrics. Appeals to adding extra analysis would be considered if they augmented the study’s goals. We do not believe this to be the case.

      Nonetheless, we must reiterate our appreciation for the engagement and suggestions. All were made with great intentions. This is more than one could hope for in a peer review exchange. The authors are truly grateful.

      eLife assessment

      The work introduces two valuable concepts in antimicrobial resistance: "variant vulnerability" and "drug applicability", which can broaden our ways of thinking about microbial infections through evolution-based metrics. The authors present a compelling analysis of a published dataset to illustrate how informative these metrics can be, study is still incomplete, as only a subset of a single dataset on a single class of antibiotics was analyzed. Analyzing more datasets, with other antibiotic classes and resistance mutations, and performing additional theoretical simulations could demonstrate the general applicability of the new concepts.

      The authors disagree strongly with the idea that the study is ‘incomplete,” and encourage the editors and reviewers to reconsider this language. Not only are the data combinatorially complete, but they are also larger in size than many similar studies of fitness landscapes. Insofar as no technical justification was offered for this “incomplete” summary, we think it should be removed. Furthermore, we question the utility of “theoretical simulations.” They are rather easy to execute but distract from the central aims of the study: to introduce new metrics, in the vein of other metrics–like druggability, IC50, MIC–that describe properties of drugs or drug targets.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Geurrero and colleagues introduces two new metrics that extend the concept of "druggability"- loosely speaking, the potential suitability of a particular drug, target, or drug-target interaction for pharmacological intervention-to collections of drugs and genetic variants. The study draws on previously measured growth rates across a combinatoriality complete mutational landscape involving 4 variants of the TEM-50 (beta lactamase) enzyme, which confers resistance to commonly used beta-lactam antibiotics. To quantify how growth rate - in this case, a proxy for evolutionary fitness - is distributed across allelic variants and drugs, they introduce two concepts: "variant vulnerability" and "drug applicability".

      Variant vulnerability is the mean vulnerability (1-normalized growth rate) of a particular variant to a library of drugs, while drug applicability measures the mean across the collection of genetic variants for a given drug. The authors rank the drugs and variants according to these metrics. They show that the variant vulnerability of a particular mutant is uncorrelated with the vulnerability of its one-step neighbors and analyze how higher-order combinations of single variants (SNPs) contribute to changes in growth rate in different drug environments.

      The work addresses an interesting topic and underscores the need for evolutionbased metrics to identify candidate pharmacological interventions for treating infections. The authors are clear about the limitations of their approach - they are not looking for immediate clinical applicability - and provide simple new measures of druggability that incorporate an evolutionary perspective, an important complement to the orthodoxy of aggressive, kill-now design principles. I think the ideas here will interest a wide range of readers, but I think the work could be improved with additional analysis - perhaps from evolutionary simulations on the measured landscapes - that tie the metrics to evolutionary outcomes.

      The authors greatly appreciate these comments, and the proposed suggestions by reviewer 1. We have addressed most of the criticisms and suggestions in our comments above.

      Reviewer #2 (Public Review):

      The authors introduce the notions of "variant vulnerability" and "drug applicability" as metrics quantifying the sensitivity of a given target variant across a panel of drugs and the effectiveness of a drug across variants, respectively. Given a data set comprising a measure of drug effect (such as growth rate suppression) for pairs of variants and drugs, the vulnerability of a variant is obtained by averaging this measure across drugs, whereas the applicability of a drug is obtained by averaging the measure across variants.

      The authors apply the methodology to a data set that was published by Mira et al. in 2015. The data consist of growth rate measurements for a combinatorially complete set of 16 genetic variants of the antibiotic resistance enzyme betalactamase across 10 drugs and drug combinations at 3 different drug concentrations, comprising a total of 30 different environmental conditions. For reasons that did not become clear to me, the present authors select only 7 out of 30 environments for their analysis. In particular, for each chosen drug or drug combination, they choose the data set corresponding to the highest drug concentration. As a consequence, they cannot assess to what extent their metrics depend on drug concentration. This is a major concern since Mira et al. concluded in their study that the differences between growth rate landscapes measured at different concentrations were comparable to the differences between drugs. If the new metrics display a significant dependence on drug concentration, this would considerably limit their usefulness.

      The authors appreciate the point about drug concentration, and it is one that the authors have made in several studies.

      The quick answer is that whether the metrics are useful for drug type-concentration A or B will depend on drug type-concentration A or B. If there are notable differences in the topography of the fitness landscape across concentration, then we should expect the metrics to differ. What Reviewer #2 points out as a “major concern,” is in fact a strength of the metrics: it is agnostic with respect to type of drug, type of target, size of dataset, or topography of the fitness landscape. And so, the authors disagree: no, that drug concentration would be a major actor in the value of the metrics does not limit the utility of the metric. It is simply another variable that one can consider when computing the metrics.

      As discussed above, we have analyzed data from a different data set, in a different drug-target problem (DHFR and antifolate drugs; see supplemental information). These demonstrate how the metric can be used to compute metrics across different drug concentrations.

      As a consequence of the small number of variant-drug combinations that are used, the conclusions that the authors draw from their analysis are mostly tentative with weak statistical support. For example, the authors argue that drug combinations tend to have higher drug applicability than single drugs, because a drug combination ranks highest in their panel of 7. However, the effect profile of the single drug cefprozil is almost indistinguishable from that of the top-ranking combination, and the second drug combination in the data set ranks only 5th out of 7.

      We reiterate our appreciation for the engagement. Reviewer #2 generously offers some technical insight on measurements of epistasis, and their opinion on the level of statistical support for our claims. The authors are very happy to engage in a dialogue about these points. We disagree rather strongly, and in addition to the general points raised above (that speak to some of this), will raise several specific rebuttals to the comments from Reviewer #2.

      For one, the Reviewer #2 is free to point to what arguments have “weak statistical support.” Having read the review, we aren’t sure what this is referring to. “Weak statistical support” generally applies to findings built from underpowered studies, or designs constructed in manner that yield effect sizes or p-values that give low confidence that a finding is believable (or is replicable). This sort of problem doesn’t apply to our study for various reasons, the least of which being that our findings are strongly supported, based on a vetted data set, in a system that has long been the object of examination in studies of antimicrobial resistance.

      For example, we did not argue that magnetic fields alter the topography of fitness landscapes, a claim which must stand up to a certain sort of statistical scrutiny. Alternatively, we examined landscapes where the drug environment differed statistically from the non-drug environment and used them to compute new properties of alleles and drugs.

      We can imagine that the reviewer is referring to the low-dimensionality of the fitness landscapes in the study. Again: the features of the dataset are a detail that the authors put into the title of the manuscript. Further, we emphasize that it is not a weakness, but rather, allows the authors to focus, and discuss the specific biology of the system. And we responsibly explain the constraints around our study several times, though none of them have anything to do with “weak statistical support.”

      Even though we aren’t clear what “weak statistical support” means as offered by Reviewer 2, the authors have nonetheless decided to provide additional analyses, now appearing in the new supplemental material.

      We have included a new Figure S2, where we offer an analysis of the topography of the 7 landscapes, based on the Kendall rank order test. This texts the hypothesis that there is no correlation (concordance or discordance) between the topographies of the fitness landscapes.

      Author response image 3.

      Kendall rank test for correlation between the 7 fitness landscapes.

      In Figure S3, we test the hypothesis that the variant vulnerability values differ. To do this, we calculate a paired t-test. These are paired by haplotype/allelic variant, so the comparisons are change in growth between drugs for each haplotype.

      Author response image 4.

      Paired t-tests for variant vulnerability.

      To this point raised by Reviewer #2:

      “For example, the authors argue that drug combinations tend to have higher drug applicability than single drugs, because a drug combination ranks highest in their panel of 7. However, the effect profile of the single drug cefprozil is almost indistinguishable from that of the top-ranking combination, and the second drug combination in the data set ranks only 5th out of 7.”

      Our study does not argue that drug combinations are necessarily correlated with a higher drug applicability. Alternatively, we specifically highlight that one of the combinations does not have a high drug applicability:

      “Though all seven drugs/combinations are β-lactams, they have widely varying effects across the 16 alleles. Some of the results are intuitive: for example, the drug regime with the highest drug applicability of the set—amoxicillin/clavulanic acid—is a mixture of a widely used β-lactam (amoxicillin) and a β-lactamase inhibitor (clavulanic acid) (see Table 3). We might expect such a mixture to have a broader effect across a diversity of variants. This high applicability is hardly a rule, however, as another mixture in the set, piperacillin/tazobactam, has a much lower drug applicability (ranking 5th out of the seven drugs in the set) (Table 3).”

      In general, we believe that the submitted paper is responsible with regards to how it extrapolates generalities from the results. Further, the manuscript contains a specific section that explains limitations, clearly and transparently (not especially common in science). For that reason, we’d encourage reviewer #2 to reconsider their perspective. We do not believe that our arguments are built on “weak” support at all. And we did not argue anything particular about drug combinations writ large. We did the opposite— discussed the particulars of our results in light of the biology of the system.

      Thirdly, to this point:

      “To assess the environment-dependent epistasis among the genetic mutations comprising the variants under study, the authors decompose the data of Mira et al. into epistatic interactions of different orders. This part of the analysis is incomplete in two ways. First, in their study, Mira et al. pointed out that a fairly large fraction of the fitness differences between variants that they measured were not statistically significant, which means that the resulting fitness landscapes have large statistical uncertainties. These uncertainties should be reflected in the results of the interaction analysis in Figure 4 of the present manuscript.”

      The authors are uncertain with regards to the “uncertainties” being referred to, but we’ll do our best to understand: our study utilized the 7 drug environments from Mira et al. 2015 with statistically significant differences between growth rates with and without drug. And so, this point about how the original set contained statistically insignificant treatments is not relevant here. We explain this in the methods section:

      “The data that we examine comes from a past study of a combinatorial set of four mutations associated with TEM-50 resistance to β-lactam drugs [39 ]. This past study measured the growth rates of these four mutations in combination, across 15 different drugs (see Supplemental Information).”

      We go on to say the following:

      “We examined these data, identifying a subset of structurally similar β-lactams that also included β-lactams combined with β-lactamase inhibitors, cephalosporins and penicillins. From the original data set, we focus our analyses on drug treatments that had a significant negative effect on the growth of wild-type/TEM-1 strains (one-tailed ttest of wild-type treatment vs. control, P < 0.01). After identifying the data from the set that fit our criteria, we were left with seven drugs or combinations (concentration in μg/ml): amoxicillin 1024 μg/ ml (β-lactam), amoxicillin/clavulanic acid 1024 μg/m l (βlactam and β-lactamase inhibitor) cefotaxime 0.123 μg/ml (third-generation cephalosporin), cefotetan 0.125 μg/ml (second-generation cephalosporins), cefprozil 128 μg/ml (second-generation cephalosporin), ceftazidime 0.125 μg/ml (third-generation cephalosporin), piperacillin and tazobactam 512/8 μg/ml (penicillin and β-lactamase inhibitor). With these drugs/mixtures, we were able to embody chemical diversity in the panel.”

      Again: The goal of our study was to develop metrics that can be used to analyze features of drugs and targets and disentangle these metrics into effects.

      Second, the interpretation of the coefficients obtained from the epistatic decomposition depends strongly on the formalism that is being used (in the jargon of the field, either a Fourier or a Taylor analysis can be applied to fitness landscape data). The authors need to specify which formalism they have employed and phrase their interpretations accordingly.

      The authors appreciate this nuance. Certainly, how to measure epistasis is a large topic of its own. But we recognize that we could have addressed this more directly and have added text to this effect.

      In response to these comments from Reviewer #2, we have added a new section focused on these points (reference syntax removed here for clarity; please see main text for specifics):

      “The study of epistasis, and discussions regarding the means to detect and measure now occupies a large corner of the evolutionary genetics literature. The topic has grown in recent years as methods have been applied to larger genomic data sets, biophysical traits, and the "global" nature of epistatic effects. We urge those interested in more depth treatments of the topic to engage larger summaries of the topic.”

      “Here will briefly summarize some methods used to study epistasis on fitness landscapes. Several studies of combinatorially-complete fitness landscapes use some variation of Fourier Transform or Taylor formulation. One in particular, the Walsh-Hadamard Transform has been used to measure epistasis across a wide number of study systems. Furthermore, studies have reconciled these methods with others, or expanded upon the Walsh-Hadamard Transform in a way that can accommodate incomplete data sets. These methods are effective for certain sorts of analyses, and we strongly urge those interested to examine these studies.”

      “The method that we've utilized, the LASSO regression, determines effect sizes for all interactions (alleles and drug environments). It has been utilized for data sets of similar size and structure, on alleles resistant to trimethoprim. Among many benefits, the method can accommodate gaps in data and responsibly incorporates experimental noise into the calculation.”

      As Reviewer #2 understands, there are many ways to examine epistasis on both high and low-dimensional landscapes. Reviewer #2 correctly offers two sorts of formalisms that allow one to do so. The two offered by Reviewer #2, are not the only means of measuring epistasis in data sets like the one we have offered. But we acknowledge that we could have done a better job outlining this. We thank Reviewer #2 for highlighting this, and believe our revision clarifies this.

      Reviewer #3 (Public Review):

      The authors introduce two new concepts for antimicrobial resistance borrowed from pharmacology, "variant vulnerability" (how susceptible a particular resistance gene variant is across a class of drugs) and "drug applicability" (how useful a particular drug is against multiple allelic variants). They group both terms under an umbrella term "drugability". They demonstrate these features for an important class of antibiotics, the beta-lactams, and allelic variants of TEM-1 beta-lactamase.

      The strength of the result is in its conceptual advance and that the concepts seem to work for beta-lactam resistance. However, I do not necessarily see the advance of lumping both terms under "drugability", as this adds an extra layer of complication in my opinion.

      Firstly, the authors greatly appreciate the comments from Reviewer #3. They are insightful, and prescriptive. And allow us to especially thank reviewer 3 for supplying a commented PDF with some grammatical and phrasing suggestions/edits. This is much appreciated. We have examined all these suggestions and made changes.

      In general, we agree with the spirit of many of the comments. In addition to our prior comments on the scope of our data, we’ll communicate a few direct responses to specific points raised.

      I also think that the utility of the terms could be more comprehensively demonstrated by using examples across different antibiotic classes and/or resistance genes. For instance, another good model with published data might have been trimethoprim resistance, which arises through point mutations in the folA gene (although, clinical resistance tends to be instead conferred by a suite of horizontally acquired dihydrofolate reductase genes, which are not so closely related as the TEM variants explored here).

      1. In our new supplemental material, we now feature an analysis of antifolate drugs, pyrimethamine and cycloguanil. We have discussed this in detail above and thank the reviewer for the suggestion.

      2. Secondly, we agree that the study will have a larger impact when the metrics are applied more broadly. This is an active area of investigation, and our hope is that others apply our metrics more broadly. But as we discussed, such a desire is not a technical criticism of our own study. We stand behind the rigor and insight offered by our study.

      The impact of the work on the field depends on a more comprehensive demonstration of the applicability of these new concepts to other drugs.

      The authors don’t disagree with this point, which applies to virtually every potentially influential study. The importance of a single study can generally only be measured by its downstream application. But this hardly qualifies as a technical critique of our study and does not apply to our study alone. Nor does it speak to the validity of our results. The authors share this interest in applying the metric more broadly.

      Reviewer #1 (Recommendations For The Authors):

      • The main weakness of the work, in my view, is that it does not directly tie these new metrics to a quantitative measure of "performance". The metrics have intuitive appeal, and I think it is likely that they could help guide treatment options-for example, drugs with high applicability could prove more useful under particular conditions. But as the authors note, the landscape is rugged and intuitive notions of evolutionary behavior can sometimes fail. I think the paper would be much improved if the authors could evaluate their new metrics using some type of quantitative evolutionary model. For example, perhaps the authors could simulate evolutionary dynamics on these landscapes in the presence of different drugs. Is the mean fitness achieved in the simulations correlated with, for example, the drug applicability when looking across an ensemble of simulations with the same drug but varied initial conditions that start from each individual variant? Similarly, if you consider an ensemble of simulations where each member starts from the same variant but uses a different drug, is the average fitness gain captured in some way by the variant vulnerability? All simulations will have limitations, of course, but given that the landscape is fully known I think these questions could be answered under some conditions (e.g. strong selection weak mutation limit, where the model could be formulated as a Markov Chain; see 10.1371/journal.pcbi.1004493 or doi: 10.1111/evo.14121 for examples). And given the authors' expertise in evolutionary dynamics, I think it could be achieved in a reasonable time. With that said, I want to acknowledge that with any new "metrics", it can be tempting to think that "we need to understand it all" before it is useful, and I don't want to fall into that trap here.

      The authors respect and appreciate these thoughtful comments.

      As Reviewer #1 highlighted, the authors are experienced with building simulations of evolution. For reasons we have outlined above, we don’t believe they would add to the arc of the current story and may encumber the story with unnecessary distractions. Simulations of evolution can be enormously useful for studies focused on particulars of the dynamics of evolution. This submitted study is not one of those. It is charged with identifying features of alleles and drugs that capture an allele’s vulnerability to treatment (variant vulnerability) and a drug’s effectiveness across alleles (drug applicability). Both features integrate aspects of variation (genetic and environmental), and as such, are improvements over both metrics used to describe drug targets and drugs.

      • The new metrics rely on means, which is a natural choice. Have the authors considered how variance (or other higher moments) might also impact evolutionary dynamics? I would imagine, for example, that the ultimate outcome of a treatment might depend heavily on the shape of the distribution, not merely its mean. This is also something one might be able to get a handle on with simulations.

      These are relevant points, and the authors appreciate them. Certainly, moments other than the mean might have utility. This is the reason that we computed the one-step neighborhood variant vulnerability–to see if the variant vulnerability of an allele was related to properties of its mutational neighborhood. We found no such correlation. There are many other sorts of properties that one might examine (e.g., shape of the distribution, properties of mutational network, variance, fano factor, etc). As we don’t have an informed reason to pursue any of this in lieu of others, we are pleased to investigate this in the future.

      Also, while we’ve addressed general points about simulations above, we want to note that our analysis of environmental epistasis does consider the variance. We urge Reviewer #1 to see our new section on “Notes on Methods Used to Measure Epistasis” where we explain some of this and supply references to that effect.

      • As I understand it, the fitness measurements here are measures of per capita growth rate, which is reasonable. However, the authors may wish to briefly comment on the limitations of this choice-i.e. the fact that these are not direct measures of relative fitness values from head-to-head competition between strains.

      Reviewer #1 is correct: the metrics are computed from means. As Reviewer 1 definitely understands, debates over what measurements are proper proxies for fitness go back a long time. We added a slight acknowledgement about the existence of multiple fitness proxies in our revision.

      • The authors consider one-step variant vulnerability. Have the authors considered looking at 2-step, 3-step, etc analogs of the 1-step vulnerability? I wonder if these might suggest potential vulnerability bottlenecks associated with the use of a particular drug/drug combo or trajectories starting from particular variants.

      This is an interesting point. We provided one-step values as a means of interrogating the mutational neighborhood of alleles in the fitness landscape. While there could certainly be other pattern-relationships between the variant vulnerability and features of a fitness landscape (as the reviewer recognizes), we don’t have a rigorous reason to test them, other than an appeal to “I would be curious if [Blank].” As in, attempting to saturate the paper with these sorts of examinations might be fun, could turn up an interesting result, but this is true for most studies.

      To highlight just how serious we are about future questions along these lines, we’ll offer one specific question about the relationship between metrics and other features of alleles or landscapes. Recent studies have examined the existence of “evolvabilityenhancing mutations,” that propel a population to high-fitness sections of a fitness landscape:

      ● Wagner, A. Evolvability-enhancing mutations in the fitness landscapes of an RNA and a protein. Nat Commun 14, 3624 (2023). https://doi.org/10.1038/s41467023-39321-8

      One present and future area of inquiry involves whether there is any relationship between metrics like variant vulnerability and these sorts of mutations.

      We thank Reviewer 1 for engagement on this issue.

      • Fitness values are measured in the presence of a drug, but it is not immediately clear how the drug concentrations are chosen and, more importantly, how the choice of concentration might impact the landscape. The authors may wish to briefly comment on these effects, particularly in cases where the environment involves combinations of drugs. There will be a "new" fitness landscape for each concentration, but to what extent do the qualitative features changes-or whatever features drive evolutionary dynamics--change?

      This is another interesting suggestion. We have analyzed a new data set for dihydrofolate reductase mutants that contains a range of drug concentrations of two different antifolate drugs. The general question of how drug concentrations change evolutionary dynamics has been addressed in prior work of ours:

      ● Ogbunugafor CB, Wylie CS, Diakite I, Weinreich DM, Hartl DL. Adaptive landscape by environment interactions dictate evolutionary dynamics in models of drug resistance. PLoS computational biology. 2016 Jan 25;12(1):e1004710.

      ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.

      There are a very large number of environment types that might alter the drug availability or variant vulnerability metrics. In our study, we used an established data set composed of different alleles of a Beta lactamase, with growth rates measured across a number of drug environments. These drug environments consisted of individual drugs at certain concentrations, as outlined in Mira et al. 2015. For our study, we examined those drugs that had a significant impact on growth rate.

      For a new analysis of antifolate drugs in 16 alleles of dihydrofolate reductase (Plasmodium falciparum), we have examined a breadth of drug concentrations (Supplementary Figure S4). This represents a different sort of environment that one can use to measure the two metrics (variant vulnerability or drug applicability). As we suggest in the manuscript, part of the strength of the metric is precisely that it can incorporate drug dimensions of various kinds.

      • The metrics introduced depend on the ensemble of drugs chosen. To what extent are the chosen drugs representative? Are there cases where nonrepresentative ensembles might be advantageous?

      The authors thank the reviewer for this. The general point has been addressed in our comments above. Further, the general question of how a study of one set of drugs applies to other drugs applies to every study of every drug, as no single study interrogates every sort of drug ensemble. That said, we’ve explained the anatomy of our metrics, and have outlined how it can be directly applied to others. There is nothing about the metric itself that has anything to do with a particular drug type – the arithmetic is rather vanilla.

      Reviewer #2 (Recommendations For The Authors):

      1. Regarding my comment about the different formalisms for epistatic decomposition analysis, a key reference is

      Poelwijk FJ, Krishna V, Ranganathan R (2016). The Context-Dependence of Mutations: A Linkage of Formalisms. PLoS Comput Biol 12(6): e1004771.

      The authors appreciate this, are fans of this work, and have cited it in the revision.

      An example where both Fourier and Taylor analyses were carried out and the different interpretations of these formalisms were discussed is

      Unraveling the causes of adaptive benefits of synonymous mutations in TEM-1 βlactamase. Mark P. Zwart, Martijn F. Schenk, Sungmin Hwang, Bertha Koopmanschap, Niek de Lange, Lion van de Pol, Tran Thi Thuy Nga, Ivan G. Szendro, Joachim Krug & J. Arjan G. M. de Visser Heredity 121:406-421 (2018)

      The authors are grateful for these references. While we don’t think they are necessary for our new section entitled “Notes on methods used to detect epistasis,” we did engage them, and will keep them in mind for other work that more centrally focuses on methods used to detect epistasis. As the author acknowledges, a full treatment of this topic is too large for a single manuscript, let alone a subsection of one study. We have provided a discussion of it, and pointed the readers to longer review articles that explore some of these topics in good detail:

      ● C. Bank, Epistasis and adaptation on fitness landscapes, Annual Review of Ecology, Evolution, and Systematics 53 (1) (2022) 457–479.

      ● T. B. Sackton, D. L. Hartl, Genotypic context and epistasis in individuals and populations, Cell 166 (2) (2016) 279–287.

      ● J. Diaz-Colunga, A. Skwara, J. C. C. Vila, D. Bajic, Á. Sánchez, Global epistasis and the emergence of ecological function, BioRxviv

      1. Although the authors label Figure 4 with the term "environmental epistasis", as far as I can see it is only a standard epistasis analysis that is carried out separately for each environment. The analysis of environmental epistasis should instead focus on which aspects of these interactions are different or similar in different environments, for example, by looking at the reranking of fitness values under environmental changes [see Ref.[26] as well as more recent related work, e.g. Gorter et al., Genetics 208:307-322 (2018); Das et al., eLife9:e55155 (2020)]. To some extent, such an analysis was already performed by Mira et al., but not on the level of epistatic interaction coefficients.

      The authors have provided a new analysis of how fitness value rankings have changed across drug environments, often a signature of epistatic effects across environments (Supplementary Figure S1).

      We disagree with the idea that our analysis is not a sort of environmental epistasis; we resolve coefficients between loci across different environments. As with every interrogation of G x E effects (G x G x E in our case), what constitutes an “environment” is a messy conversation. We have chosen the route of explaining very clearly what we mean:

      “We further explored the interactions across this fitness landscape and panels of drugs in two additional ways. First, we calculated the variant vulnerability for 1-step neighbors, which is the mean variant vulnerability of all alleles one mutational step away from a focal variant. This metric gives information on how the variant vulnerability values are distributed across a fitness landscape. Second, we estimated statistical interaction effects on bacterial growth through LASSO regression. For each drug, we fit a model of relative growth as a function of M69L x E104K x G238S x N276D (i.e., including all interaction terms between the four amino acid substitutions). The effect sizes of the interaction terms from this regularized regression analysis allow us to infer higher-order dynamics for susceptibility. We label this calculation as an analysis of “environmental epistasis.”

      As the grammar for these sorts of analyses continues to evolve, the best one can do is be clear about what they mean. We believe that we communicated this directly and transparently.

      1. As a general comment, to strengthen the conclusions of the study, it would be good if the authors could include additional data sets in their analysis.

      The authors appreciate this comment and have given this point ample treatment. Further, other main conclusions and discussion points are focused on the biology of the system that we examined. Analyzing other data sets may demonstrate the broader reach of the metrics, but it would not alter the strength of our own conclusions (or if they would, Reviewer #2 has not told us how).

      1. There are some typos in the units of drug concentrations in Section 2.4 that should be corrected.

      The authors truly appreciate this. It is a great catch. We have fixed this in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      I would suggest demonstrating the concepts for a second drug class, and suggest folA variants and trimethoprim resistance, for which there is existing published data similar to what the authors have used here (e.g. Palmer et al. 2015, https://doi.org/10.1038/ncomms8385)

      The authors appreciate this insight. As previously described, we have analyzed a data set of folA mutants for the Plasmodium falciparum ortholog of dihydrofolate reductase, and included these results in new supplemental material. Please see the supplementary material.

      There are some errors in formatting and presentation that I have annotated in a separate PDF file (https://elife-rp.msubmit.net/eliferp_files/2023/04/11/00117789/00/117789_0_attach_8_30399_convrt.pdf), as the absence of line numbers makes indicating specific things exceedingly difficult.

      The authors apologize for the lack of line numbers (an honest oversight), but moreover, are tremendously grateful for this feedback. We have looked at the suggested changes carefully and have addressed many of them. Thank you.

      One thing to note: we have included a version of Figure 4 that has effects on the same axes. It appears in the supplementary material (Figure S4).

      In closing, the authors would like to thank the editors and three anonymous reviewers for engagement and for helpful comments. We are confident that the revised manuscript qualifies as a substantive revision, and we are grateful to have had the opportunity to participate.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The regulation of motor autoinhibition and activation is essential for efficient intracellular transport. This manuscript used biochemical approaches to explore two members in the kinesin-3 family. They found that releasing UNC-104 autoinhibition triggered its dimerization whereas unlocking KLP-6 autoinhibition is insufficient to activate its processive movement, which suggests that KLP-6 requires additional factors for activation, highlighting the common and diverse mechanisms underlying motor activation. They also identified a coiled-coil domain crucial for the dimerization and processive movement of UNC-104. Overall, these biochemical and single-molecule assays were well performed, and their data support their statements. The manuscript is also clearly written, and these results will be valuable to the field.

      Thank you very much!

      Ideally, the authors can add some in vivo studies to test the physiological relevance of their in vitro findings, given that the lab is very good at worm genetic manipulations. Otherwise, the authors should speculate the in vivo phenotypes in their Discussion, including E412K mutation in UNC-104, CC2 deletion of UNC-104, D458A in KLP-6.

      1. We have shown the phenotypes unc-104(E412K) mutation in C. elegans (Niwa et al., Cell Rep, 2016) and described about it in discussion (p.14 line 3-4). The mutant worm showed overactivation of the UNC-104-dependent axonal transport, which is consistent with our biochemical data showing that UNC-104(1-653)(E412K) is prone to form a dimer and more active than wild type.

      2. It has been shown that L640F mutation induces a loss of function phenotype in C. elegans (Cong et al., 2021). The amount of axonal transport is reduced in unc-104(L640F) mutant worms. L640 is located within the CC2 domain. To show the importance of CC2-dependent dimerization in the axonal transport in vivo, we biochemically investigated the impact of L640F mutation.

      By introducing L640F into UNC-104(1-653)(E412K), we performed SEC analysis. The result shows that UNC-104(1-653)(E412K,L640F) failed to form stable dimers despite the release of their autoinhibition (new Figure S8). This result strongly suggests the importance of the CC2 domain in the axonal transport in vivo. Based on the result, we discussed it in the revised manuscript (p.13 line 6-8).

      1. Regarding KLP-6(D458A), we need a genetic analysis using genome editing and we would like to reserve it for a future study. We speculate that the D458A mutation could lead to an increase in transport activity in vivo similar to unc-104(E412K). This is because the previous study have shown that wild-type KLP-6 was largely localized in the cell body, while KLP-6(D458A) was enriched at the cell periphery in the N2A cells (Wang et al., 2022). We described it in discussion (p.14 line 13-14).

      While beyond the scope of this study, can the author speculate on the candidate for an additional regulator to activate KLP-6 in C. elegans?

      The heterodimeric mechanoreceptor complex, comprising LOV-1 and PKD-2, stands as potential candidates for regulating KLP-6 dimerization. We speculate the heterodimerization property is suitable for the enhancement of KLP-6 dimerization. On the other hand, it's noteworthy that KLP-6 can undergo activation in Neuro 2a cells upon the release of autoinhibition (Wang et al., 2022). This observation implies the involvement of additional factors which are not present in sf9 cells may be able to induce dimerization. Post-translational modifications would be one of the candidates. We discussed it in p14 line 7-14.

      The authors discussed the differences between their porcine brain MTs and chlamydonomas axonemes in UNC-104 assays. However, the authors did not really retest UNC-104 on axonemes after more than two decades, thereby not excluding other possibilities.

      We thought that comparing different conditions used in different studies is essential for the advancement of the field of molecular motors. Therefore, we newly performed single-molecule assay using Chlamydomonas axonemes and compared the results with brain MTs (Fig. S6). Just as observed in the study by Tomoshige et al., we were also unable to observe the processive runs of UNC-104(1-653) on Chlamydomonas axonemes (Fig. S6A). Furthermore, we found that the landing rate of UNC-104(1-653) on Chlamydomonas axonemes was markedly lower in comparison to that on purified porcine microtubules (Fig. S6B).

      Reviewer #1 (Recommendations For The Authors):

      More discussion as suggested above would improve the manuscript.

      We have improved our manuscript as described above.

      Reviewer #2 (Public Review):

      The Kinesin superfamily motors mediate the transport of a wide variety of cargos which are crucial for cells to develop into unique shapes and polarities. Kinesin-3 subfamily motors are among the most conserved and critical classes of kinesin motors which were shown to be self-inhibited in a monomeric state and dimerized to activate motility along microtubules. Recent studies have shown that different members of this family are uniquely activated to undergo a transition from monomers to dimers.

      Niwa and colleagues study two well-described members of the kinesin-3 superfamily, unc104 and KLP6, to uncover the mechanism of monomer to dimer transition upon activation. Their studies reveal that although both Unc104 and KLP6 are both self-inhibited monomers, their propensities for forming dimers are quite different. The authors relate this difference to a region in the molecules called CC2 which has a higher propensity for forming homodimers. Unc104 readily forms homodimers if its self-inhibited state is disabled while KLP6 does not.

      The work suggests that although mechanisms for self-inhibited monomeric states are similar, variations in the kinesin-3 dimerization may present a unique form of kinesin-3 motor regulation with implications on the forms of motility functions carried out by these unique kinesin-3 motors.

      Thank you very much!

      Reviewer #2 (Recommendations For The Authors):

      The work is interesting but the process of making constructs and following the transition from monomers to dimers seems to be less than logical and haphazard. Recent crystallographic studies for kinesin-3 have shown the fold and interactions for all domains of the motor leading to the self-inhibited state. The mutations described in the manuscript leading to disabling of the monomeric self-inhibited state are referenced but not logically explained in relation to the structures. Many of the deletion constructs could also present other defects that are not presented in the mutations. The above issues prevent wide audience access to understanding the studies carried out by the authors.

      We appreciate this comment. We improved it as described bellow.

      Suggestions: Authors should present schematic, or structural models for the self-inhibited and dimerized states. The conclusions of the papers should be related to those models. The mutations should be explained with regard to these models and that would allow the readers easier access. Improving access to the readers in and outside the motor field would truly improve the impact of the manuscript on the field.

      The structural models illustrating the autoinhibited state have been included in new Figure S4, accompanied by an explanation of the correlation between the mutations and these structures in the figure legend. Additionally, schematic models outlining the dimerization process of both UNC-104 and KLP-6 have been provided in Figure S9 to enhance reader comprehension of the process.

      Reviewer #3 (Public Review):

      In this work, Kita et al., aim to understand the activation mechanisms of the kinesin-3 motors KLP-6 and UNC-104 from C. elegans. As with many other motor proteins involved in intracellular transport processes, KLP-6 and UNC-104 motors suppress their ATPase activities in the absence of cargo molecules. Relieving the autoinhibition is thus a crucial step that initiates the directional transport of intracellular cargo. To investigate the activation mechanisms, the authors make use of mass photometry to determine the oligomeric states of the full-length KLP-6 and the truncated UNC-104(1-653) motors at sub-micromolar concentrations. While full-length KLP-6 remains monomeric, the truncated UNC-104(1-653) displays a sub-population of dimeric motors that is much more pronounced at high concentrations, suggesting a monomer-to-dimer conversion. The authors push this equilibrium towards dimeric UNC-104(1-653) motors solely by introducing a point mutation into the coiled-coil domain and ultimately unleashing a robust processivity of the UNC-104 dimer. The authors find that the same mechanistic concept does not apply to the KLP-6 kinesin-3 motor, suggesting an alternative activation mechanism of the KLP-6 that remains to be resolved. The present study encourages further dissection of the kinesin-3 motors with the goal of uncovering the main factors needed to overcome the 'self-inflicted' deactivation.

      Thank you very much!

      Reviewer #3 (Recommendations For The Authors):

      126-128: It is surprising that surface-attachment does not really activate the full-length KLP6 motor (v=48 {plus minus} 42 nm/s). Can the authors provide an example movie of the gliding assay for the FL KLP6 construct? Gliding assays are done by attaching motors via their sfGFP to the surface using anti-GFP antibodies. Did the authors try to attach the full-length KLP-6 motor directly to the surface? If the KLP-6 motor sticks to the surface via its (inhibitory) C-terminus, this attachment would be expected to activate the motor in the gliding assay, ideally approaching the in vivo velocities of the activated motor.

      We have included an example kymograph showing the gliding assay of KLP-6FL (Fig. S1A). When we directly attached KLP-6FL to the surface, the velocity was 0.15 ± 0.02 µm/sec (Fig. S1B), which is similar to the velocity of KLP-6(1-390). While the velocity observed in the direct-attachment condition is much better than those observed in GFP-mediated condition, the observed velocity remains considerably slower than in vivo velocities. Firstly, we think this is because dimerization of KLP-6 is not induced by the surface attachment. Previous studies have shown that monomeric proteins are generally slower than dimeric proteins in the gliding assay (Tomishige et al., 2002). These are consistent with our observation that KLP-6 remains to be monomeric even when autoinhibition is released. Secondly, in vitro velocity of motors is generally slower than in vivo velocity.

      156-157: It seems that the GCN4-mediated dimerization induces aggregation of the KLP6 motor domains as seen in the fractions under the void volume in Figure 3B (not seen with the Sf9 expressed full-length constructs, see Figure 1B). Also, the artificially dimerized motor construct does not fully recapitulate the in vivo velocity of UNC-104. Did the authors analyze the KLP-6(1-390)LZ with mass photometry and is it the only construct that is expressed in E. coli?

      KLP-6::LZ protein is not aggregating. We have noticed that DNA and RNA from E. coli exists in the void fraction and they occasionally trap recombinant kinesin-3 proteins in the void fraction. To effectively remove these nucleic acids from our protein samples, we employed streptomycin sulfate as a purification method (Liang et al., Electrophoresis, 2009). Please see Purification of recombinant proteins in Methods. In the size exclusion chromatography analysis, we observed that KLP-6(1-393)LZ predominantly eluted in the dimer fraction (New Figure 3). Subsequently, we reanalyzed the motor's motility using a total internal reflection fluorescence (TIRF) assay, as shown in the revised Figure 3. Even after these efforts, the velocity was not changed significantly. The velocity of KLP-6LZ is about 0.3 µm/sec while that of cellular KLP-6::GFP is 0.7 µm/sec (Morsci and Barr, 2011). Similar phenomena, "slower velocity in vitro", has been observed in other motor proteins.

      169: In Wang et al., (2022) the microtubule-activated ATPase activities of the mutants were measured in vitro as well, with the relative activities of the motor domain and the D458A mutant being very similar. The D458A mutation is introduced into the full-length motor in Wang et al., while in the present work, the mutation is introduced into the truncated KLP-6(1-587) construct. Can the authors explain their reasoning for the latter?

      (1) Kinesins are microtubule-stimulated ATPases. i.e. The ATPase activity is induced by the binding with a microtubule.

      (2) Previous studies have shown that the one-dimensional movement of the monomeric motor domain of kinesin-3 depends on the ATPase activity even when the movement does not show clear plus-end directionality (Okada et al., Science, 1998).

      (3) While KLP-6(1-587) does not bind to microtubules, both KLP-6(1-390) (= the monomeric motor domain) and KLP-6(1-587)(D458A) similarly bind to microtubules and show one dimensional diffusion on microtubules (Fig. 4E and S2B).

      Therefore, the similar ATPase activities of the motor domain(= KLP-6(1-390)) and KLP-6(D458A) observed by Wang et al. is because both proteins similarly associate with and hydrolyze ATP on microtubules, which is consistent with our observation. On the other hand, because KLP-6(wild type) cannot efficiently bind to microtubules, the ATPase activity is low.

      Can the authors compare the gliding velocities of the KLP-6(1-390)LZ vs KLP-6(1-587) vs KLP-6(1-587)(D458A) constructs to make sure that the motors are similarly active?

      We conducted a comparative analysis of gliding velocities involving KLP-6(1-390), KLP-6(1-587), and KLP-6(1-587)(D458A) (Fig. S1C). We used KLP-6(1-390) instead of KLP-6(1-390)LZ, aligning with the protein used by Wang et al.. We demonstrated that both KLP-6(1-587) and KLP-6(1-587) (D458A) exhibited activity levels comparable to that of KLP-6(1-390). The data suggests that the motor of all recombinant proteins are similarly active.

      Please note that, unlike full length condition (Fig. 1D and S1A and S1B), the attachment to the surface using the anti-GFP antibody can activates KLP-6(1-587). The data suggests that, due to the absence of coverage by the MBS and MATH domain (Wang et al., Nat. Commun., 2022), the motor domain of KLP-6(1-587) to some extent permits direct binding to microtubules under gliding assay conditions.

      Are the monomeric and dimeric UNC-104(1-653) fractions in Figure 5B in equilibrium? Did the authors do a re-run of the second peak of UNC-104(1-653) (i.e. the monomeric fraction with ~100 kDa) to assess if the monomeric fraction re-equilibrates into a dimer-monomer distribution?

      We conducted a re-run of the second peak of UNC-104(1-653) and verified its re-equilibration into a distribution of dimers and monomers after being incubated for 72 hours at 4°C (Fig. S5).

      UNC-104 appears to have another predicted coiled-coiled region around ~800 aa (e.g. by NCoils) that would correspond to the CC3 in the mammalian homolog KIF1A. This raises the question if the elongated UNC-104(1-800) would dimerize more efficiently than UNC-104(1-653) (authors highlight the sub-population of dimerized UNC-104(1-653) at low concentrations in Figure 5C) and if this dimerization alone would suffice to 'match' the UNC-104(1-653)E412K mutant (Figure 5D). Did the authors explore this possibility? This would mean that dimerization does not necessarily require the release of autoinhibition.

      We have tried to purify UNC-104(1-800) and full-length UNC-104 using the baculovirus system. However, unfortunately, the expression level of UNC-104(1-800) and full length UNC-104 was too low to perform in vitro assays even though codon optimized vectors were used. Instead, we have analyzed full-length human KIF1A. We found that full-length KIF1A is mostly monomeric, not dimeric (Please look at the Author response image 1). The property is similar to UNC-104(1-653) (Figure 5A-C). Therefore, we think CC3 does not strongly affect dimerization of KIF1A, and probably its ortholog UNC-104. Moreover, a recent study has shown that CC2 domain, but not other CC domains, form a stable dimer in the case of KIF1A (Hummel and Hoogenraad, JCB, 2021). Given the similarity in the sequence of KIF1A and UNC-104, we anticipate that the CC2 domain of UNC-104 significantly contributes to dimerization, potentially more than other CC domains. We explicitly describe it in the Discussion in the revised manuscript.

      Author response image 1.

      Upper left, A representative result of size exclusion chromatography obtained from the analysis of full-length human KIF1A fused with sfGFP.

      Upper right, A schematic drawing showing the structure of KIF1A fused with sfGFP and a result of SDS-PAGE recovered from SEC analysis. Presumable dimer and monomer peaks are indicated.

      Lower left, Presumable dimer fractions in SEC were collected and analyzed by mass photometry. The result confirms that the fraction contains considerable amount of dimer KIF1A.

      Lower right, Presumable monomer fractions were collected and analyzed by mass photometry. The result confirms that the fraction mainly consists of monomer KIF1A.

      Note that these results obtained from full-length KIF1A protein are similar to those of UNC-104(1-653) protein shown in Figure 5A-C.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors describe a method to decouple the mechanisms supporting pancreatic progenitor self-renewal and expansion from feed-forward mechanisms promoting their differentiation. The findings are important because they have implications beyond a single subfield. The strength of evidence is solid in that the methods, data and analyses broadly support the claims with only minor weaknesses.

      We are grateful for the substantial effort that reviewers put into reading our manuscript and providing such a detailed feedback. We have strived to address, as much as possible, all comments and criticisms. Thanks to the feedback, we believe that we have now a significantly improved manuscript. Below, there is a point-bypoint response.

      Reviewer #1 (Public Review)

      In this manuscript, the authors are developing a new protocol that aims at expanding pancreatic progenitors derived from human pluripotent stem cells under GMP-compliant conditions. The strategy is based on hypothesis-driven experiments that come from knowledge derived from pancreatic developmental biology.

      The topic is of major interest in the view of the importance of amplifying human pancreatic progenitors (both for fundamental purposes and for future clinical applications). There is indeed currently a major lack of information on efficient conditions to reach this objective, despite major recurrent efforts by the scientific community.

      Using their approach that combines stimulation of specific mitogenic pathways and inhibition of retinoic acid and specific branches of the TGF-beta and Wnt pathways, the authors claim to be able, in a highly robust and reproducible manner) to amplify in 10 passages the number of pancreatic progenitors (PP) by 2,000 folds, which is really an impressive breakthrough.

      The work is globally well-performed and quite convincing. I have however some technical comments mainly related to the quantification of pancreatic progenitor amplification and to their differentiation into beta-like cells following amplification.

      We thank the reviewer for the positive assessment. Below we provide a point-by-point response to specific comments and criticisms.

      Reviewer #1 (Recommendations For The Authors)

      Figure 1:

      Panel A: What is exactly counted in Fig. 1A? Is it the number of PP (as indicated in the title) or the total number of cells? If it is PPs, was it done following PDX1/NKX6.1/SOX9 staining and FACS quantification? This question applies to a number of Figures and the authors should be clear on this point.

      We now define ‘PP cells’ as ‘PP-containing cells’ (PP cells) the first time we use the term in the RESULTS section.

      Panel D: I do not understand the source of TGFb1, GDF11, FGF18, PDGFA. Which cell type(s) express such factors in culture? I was not convinced that the signals are produced by PP and act through an autocrine loop. I have the same type of questions for the receptors: PDGFR on the second page of the results; RARs and RXRs on the third page.

      We refer to these factors/receptors as components of a tentative autocrine loop. We agree we do not prove it and we now comment on this in the discussion section.

      Figure 2:

      FACS plots are very difficult to analyze for two reasons: I do not understand the meaning of the y axes (PDX1/SOX9). Does that mean that 100% of the cells were PDX1+/SOX9+? The authors should show the separated FACS plots. More importantly, the x axes indicate that NKX6.1 FACS staining is very weak. This is by far different from what can be read in publications performing the same types of experiments (publications by Millman, Otonkoski...as examples). How was quantification performed when it is so difficult to properly define positive vs negative populations? It is necessary to present proper "negative controls" for FACS experiments and to clearly indicate how positive versus cells were defined

      We now explain the gating strategy better in the results section, all controls are included in figure S2.

      Figure 3:

      What is the exact "phenotype" of the cells that incorporated EdU: It would be really instructive to add PDX1/NKX6.1/SOX9 staining on top of EdU. I am also surprised that 20% of the cells stain positive for Annexin V. This is a huge fraction. Does that mean that many cells (20%) are dying and if the case, how amplification can take place under such deleterious conditions?

      This is an interesting mechanistic point but performing these experiments would delay the publication of the final manuscript for too long. These assays were done at p3 in order to catch CINI cells that do not expand in most cases. It is important to note that cell death also appears higher in CINI cells. It is likely that the combination of these effects results in reproducible expansion under C5. We comment on the possibilities in the discussion section.

      Figure 4:

      On FACS plots the intensity at the single cell level (see x-axis of the figure) of the NKX6.1 staining is found to increase in Fig. 4G by 50-100 folds when compared to Fig. 4E. Is it expected? This should be discussed in the text. Do the authors observe the same increase by immunocytochemistry?

      The apparent difference is actually 10-fold (from 2x102 to 2x103). We think that the most likely reason for this apparent increase is that at p0 we typically used very few cells for the FC in order to keep as many as possible for the subsequent expansion. If we had used more, we would be able to also detect cells with higher expression. As we mention in the bioinformatics analysis, NKX6 expression does increase with passaging and therefore it is also possible that at least part of this increase is real. However, we don’t have suitable data (same number of cells analyzed at each passage) to address this in a reliable manner.

      Figure 5

      Previous data from the scientific literature indicate that in vitro, by default, PP gives rise to duct-like cells. This is a bit described in the result section and supplementary figures taking into account the expression of transcription factors. However the data are not clearly explained and described in quite a qualitative manner. They should appear in a quantitative fashion (and the main figures), adding additional duct cell markers such as Carbonic anhydrase, SPP1, CFTR, and others. I assume that the authors can easily use their transcriptomic data to produce a Figure to be described and discussed in detail.

      We think it can be misleading to use such markers (other than TFs and the latter only as a collective) because specific markers of terminal differentiation are more often than not expressed during development in multipotent progenitors, the most conspicuous example been CPA1. To illustrate the point, we used the RNA Seq data of and plotted the expression values of a panel of duct genes in isolated human fetal progenitors (Ramond et al., 2017) together with their expression in p0 PP and ePP cells from all three different procedure (please see below). All raw RNA Seq data were processed together to enable direct comparison. According to the analysis of Ramond et al the A population corresponds to MPCs, C to early endocrine progenitors (EP), D to late endocrine progenitors and, by inference and gene expression pattern B to BPs. Expression levels of all these markers were very similar suggesting that these markers cannot be used to distinguish between duct cells and progenitor cells. Importantly, SC-islets derived from either dPP or ePP cells express extremely low and similar levels of KRT19, a marker of duct cells. This latter information is now included in the last part of the results (Figure S7).

      Author response image 1.

      Fig. 7:<br /> The figure is a bit disappointing for 2 reasons. In A and B, the quality of INS, GCG, and SST staining is really poor. In E, GSIS is really difficult to interpret. They should not be presented as stimulatory indexes. The authors should present independently: INS content; INS secretion at low glucose; INS secretion at high glucose; INS secretion with KCL. Finally, the authors should indicate that glucose poorly (around 2 fold) activates insulin/C-Pept secretion in their stem-cell-derived islets.

      We disagree with the quality assessment of the immunofluorescence. Stimulation indexes are also used very widely but we now provide data for actual C-peptide secretion normalized for DNA content of the SC-islets. For technical reasons we do not have normalized C-peptide secretion for human islets. However, we provide a direct comparison to the stimulation index of human islets assayed under the same conditions (2.7 mM glucose / 16.7 mM glucose / 16.7 mM glucose + 30 mM KCl) without presenting SC-islets separately and tweaking the glucose basal (lowering) and stimulation (increasing) levels to inflate the stimulation index. This is unfortunately common. In any case, we do not claim an improvement in the differentiation conditions and our S5-S7 steps may not be optimal but this is not the subject of this work.

      Reviewer #2 (Public Review)

      Summary

      The paper presents a novel approach to expand iPSC-derived pdx1+/nkx6.1+ pancreas progenitors, making them potentially suitable for GMP-compatible protocols. This advancement represents a significant breakthrough for diabetes cell replacement therapies, as one of the current bottlenecks is the inability to expand PP without compromising their differentiation potential. The study employs a robust dataset and state-of-the-art methodology, unveiling crucial signaling pathways (eg TGF, Notch...) responsible for sustaining pancreas progenitors while preserving their differentiation potential in vitro.

      Strengths

      This paper has strong data, guided omics technology, clear aims, applicability to current protocols, and beneficial implications for diabetes research. The discussion on challenges adds depth to the study and encourages future research to build upon these important findings.

      We thank the reviewer for the positive assessment. Below we provide a point-by-point response to general comments and criticisms.

      Weaknesses

      The paper does have some weaknesses that could be addressed to improve its overall clarity and impact. The writing style could benefit from simplification, as certain sections are explained in a convoluted manner and difficult to follow, in some instances, redundancy is evident. Furthermore, the legends accompanying figures should be self-explanatory, ensuring that readers can easily understand the presented data without the need to be checking along the paper for information.

      We have simplified the text in several places and removed redundancies, particularly in the discussion. We revisited the figure legends and made minor corrections to increase clarity. However, regarding the figure legends, we think that adding the interpretation of the results would be redundant to the main text.

      The culture conditions employed in the study might benefit from more systematic organization and documentation, making them easier to follow.<br /> There is a comparative Table (Table S1) where all conditions are summarized. We refer to this Table every time that we introduce a new condition. We also have a Table (Table S4) which presents all different media and components used it the differentiation procedure.

      Another important aspect is the functionality of the expanded cells after differentiation. While the study provides valuable insights into the expansion of pancreas progenitors in vitro and does the basic tests to measure their functionality after differentiation the paper could be strengthened by exploring the behavior and efficacy of these cells deeper, and in an in vivo setting.

      This will be done in a future study where we will also introduce a number of modifications in S5-S7

      Quantifications for immunofluorescence (IF) data should be displayed.

      We have not conducted quantifications of IFs because FC is much more objective and accurate. We have not conducted FC for CDX2 and AFP because all other data strongly favor C6 anyway. It should be noted that CDX2 and AFP expression is generally not addressed at all presumably because it raises uncomfortable questions and, to our knowledge, we are the first to address this so exhaustively.

      Some claims made in the paper may come across as somewhat speculative.

      We have now indicated so where applicable.

      Additionally, while the paper discusses the potential adaptability of the method to GMP-compatible protocols, there is limited elaboration on how this transition would occur practically or any discussion of the challenges it might entail.

      We have now added a paragraph discussing this in the discussion section.

      Reviewer #2 (Recommendations For The Authors)

      Related to Figure 1:

      • Unclear if CINI or SB431542 + CINI was used (first paragraph of results...)

      The paragraph was unclear and it is now rewritten

      • Was the differentiation to PP similar between the different attempts? A basic QC for each Stem Cell technology differentiation would be good to include.

      We added (Figure 1B) a comparison of expression data of general genes (QC) in PP cells showing very comparable patterns of expression. Some of these PP cells went on to expand and most did not but there is no apparent correlation of this with the gene expression data.

      • qPCR data - relative fold? over what condition? (indicate on axis label)

      We added a label as well as an explanation on p0 values in the figure legend

      • FGF18/ PDGFA - worth including background in pancreas development as in the other factors.

      Background information has been added

      • Bioinformatics is a bit biased with a few genes selected - what are the DEGs / top enriched pathways? Maybe worth showing a volcano plot of the DEGs for example.

      We have done all these standard analyses but we think that they did not contribute anything else useful to the study with the exception of pointing to the finding that the TGFb pathway is negatively correlated with expansion, and this is included in the study. The ‘unbiased’ analysis that the reviewer suggests did not turn out something else useful to exploit for the expansion. This does not mean that our approach is biased – in our view it is hypothesis-driven. As we also write in the manuscript, if in a certain pathway a key gene fails to be expressed, the pathway will not show up in any GO or GSEA analyses. However, the pathway will still be regulated. The RA and FGF18 cases clearly illustrate this. We realize that these analyses have become a standard but we think that it is not the only way to approach genomics data and these approaches did not offer much in the context of this study.

      • The E2F part is very speculative

      The pathway came up as a result of ‘unbiased’ GSEA analyses. However, we do agree and rephrased.

      • The authors claim ' the negative correlation of TGFb signalling with expansion retrospectively justifies the use of A83 '. However, p0 is not treated with A83 - how can they tell that there is a correlation between TGFb signalling and expansion?

      The correlation came from the RNA Seq data analysis during expansion. We have rephrased slightly to convey the message more clearly.

      • Typo with TGFbeta inhibitor name is mispelled (A3801)

      Corrected

      • Page 5 - last paragraph - Table S3? (isnt it refering to S2?)

      Since Table S2 is the list of the regulated genes and S3 is the list of the regulated signaling pathway components both are relevant here, we now refer to both.

      • In the text Figure 2G should read Figure 1G (page 7, end of 1st paragraph).

      Corrected

      • 'Autocrine loop' existence – speculative

      Added the phrase ‘we speculated’. We refer to this only as a tentative interpretation. We also elaborate in the discussion now.

      Related to Figure 2:

      • I am not sure if I would refer to chemical "activation/inhibition" of pathways as 'gain/loss of function'. Maybe this term is more adequate for genetic modifications.

      For genetic manipulations, these terms are (supposed to be) accompanied by the adjective ‘genetic’ but to avoid misinterpretations we changed the terms to activation and inhibition as suggested.

      • It would be good to include a summary of the different conditions as a schematic in one of the figures, to make it very clear to the reader what the conditions are.

      We tried this in an early version of the manuscript but, in our view, it was adding complexity, rather than simplifying things. The problem is that as such the Table cannot be integrated in any figure if eg in Figure 2 it would be too early, if in Figure 4 it would be too late and so on. All conditions show up in detail in Table S1.

      • Nkx6.1 - is the image representative? It looks like Nkx6.1 decreases over the passages.

      We do mention in the text that ‘… even though expansion (in C5) appeared to somewhat reduce the number of NKX6.1+ cells. (Figure 2E-G). As we mentioned, this was one of the reasons to continue with other conditions (C6-C8).

      • Upregulation of AFP/ CDX2 is a bit concerning - the IF for C5 p5 shows a high proportion of CDX2+ cells (Fig S2I). perhaps it would be good to quantify the IF.

      It was concerning – this is why we then tested conditions C6-8. Since it is C6 that we propose at the end, it would be, in our view, extraneous to quantify CDX2 in C5.

      • How do C5/C1/C0 compare to CINI?

      We now remind the reader in the results section that CINI was not reproducible - so any other comparison would be extraneous.

      Related to Figure 3:

      • There is a 'Lore Ipsum' label above B

      Corrected

      Related to Figure 4:

      • It is good that AFP expression is reduced at p10, but there seems to be a high proportion of AFP at p5. IF/FACS should be quantified.

      We think that this would not add significantly since there are several other criteria, particularly the increase of the PDX1+/SOX9+/NKX6.1+ that clearly show that the C6 condition is preferable. Further elaboration of C6 could use such additional criteria. We comment on CDX2 / AFP in the discussion.

      • CDX2 should be quantified by IF / FACS.

      We think that this would not add significantly since there are several other criteria, particularly the increase of the PDX1+/SOX9+/NKX6.1+ that clearly show that the C6 condition is preferable. Further elaboration of C6 could use such additional criteria. We comment on CDX2 / AFP in the discussion.

      • Karyotype analysis is good but not very precise when analyzing genetic micro alterations... what does a low-pass sequencing of the expanding lines look like? Are there any micro-deletions in the expanding lines?

      This is an unusual request. Microdeletions may occur at any point – during passaging of hPS cells, differentiation as well as well as expansion but such data are so far not shown in publications – and reasonably so in our opinion. Thus, we have not done this analysis but it certainly would be appropriate in a clinical setting as part of QC.

      • Data supporting that the cells can be cryopreserved and recovered with >85% survival rate is not provided.

      We now provide data for the C6-mediated expansion (Figure 4J). The freezing procedure was developed during the time we were testing C5 and we don’t have sufficient data to show reliably the survival of the cells during C5 expansion. Thus, we have now removed the reference in the C5 part of the manuscript.

      Related to Figure 5:

      -Figure 5C - perhaps worth commenting on the different pathways that are enriched when cells undergo expansion and show some of the genes that are up/down regulated.

      This is indeed of interest but since it will not address any specific question in the context of this work (eg is the endocrine program repressed?) and since it would not be followed by additional experiments we think that it would burden the manuscript unnecessarily. The data are accessible for any type of analysis through the GEO database.

      • Figure S5D shows in vitro clustering away from in vivo PP - it would be good to explain how in vitro generated PP differs from their in vivo counterparts instead of restricting the comparison to the in vitro protocol.

      We have added a possible interpretation of this observation in the results section and discuss, how one could go properly about this comparison.

      • Quantification of Fig5F should be included. Is GP2 expression detectable by IF at p5 too?

      We have quantified GP2 expression by FC at p10 but not at earlier stages. We include now the FC data in Fig5F

      • Validation of Fig5G by qPCR would be good. PDX1 did not seem reduced by IF in Figure 4.

      The purpose of Fig5G is to compare the expression of the same genes across different expansion approaches. Therefore, in our view, qPCRs would not be appropriate since we do not have samples from the other approaches. We did not claim a reduction in PDX1 expression.

      • How can the authors explain the NGN3 expression at PP?

      In our view, differentiation is a dynamic process and not all cells are synchronized at the same cell type, this is true in vivo and in vitro. Sc-RNA Seq data indeed show a small population of cells at PP that are NEUROG3+ (our unpublished data). We have now included this in the discussion.

      Related to Figure 6:

      • How do the different lines differ? Any statistical comparison between lines?

      There is a paragraph dealing with the comparison of PP and ePP cells (p5 and p10) from different lines at the level of gene expression and the data are in Figure S6A-G. Then there is a paragraph addressing this at the level of PDX1/SOX9/NKX6.1 expression by FC. We have now expanded and rewrote the latter to include statistical comparisons across PPs from different lines at p0, p5 an p10

      Related to Figure 7:

      • Mention the use of micropatterned

      Micropatterned wells - not really correct. They use Aggrewells, micropatterned plates are something else.

      We changed ‘micropatterned wells’ into ‘microwells’

      • Figure 7D, those are qPCR data. The label is inconsistent, why did they call it fold induction instead of fold change? Also, not sure if plotting the fold change to hPSC is the best here.

      We use fold change when comparing the expression of the same gene at different passages but fold induction when comparing to its expression in hPS cells. We made sure it is also explained in the figure legends.

      • Absolute values should be shown for the GSIS to determine basal insulin secretion. Also, sequential stimulation to address if the cells are able to respond to multiple glucose stimulations.

      We include now the secreted amounts of human C-peptide under the different conditions (Figure S7) normalized for cell numbers using their DNA content for the normalization. The many parameters we have used suggest that dPP and ePP SC-islets are very similar. If we were claiming a better S5-S7 procedure, such an assay would have been necessary but in this context, we think it is not absolutely necessary.

      • In vivo data would have strengthened the story. It is not clear if, in vivo, the cells will behave as the nonexpanded iPSC-derived beta cells.

      We agree and these studies are under way but we do not expect to complete them soon. We feel that it is important that this work appears sooner rather than later.

      Reviewer #3 (Public Review)

      Summary:

      In this work, Jarc et al. describe a method to decouple the mechanisms supporting progenitor self-renewal and expansion from feed-forward mechanisms promoting their differentiation.

      The authors aimed at expanding pancreatic progenitor (PP) cells, strictly characterized as PDX1+/SOX9+/NKX6.1+ cells, for several rounds. This required finding the best cell culture conditions that allow sustaining PP cell proliferation along cell passages, while avoiding their further differentiation. They achieve this by comparing the transcriptome of PP cells that can be expanded for several passages against the transcriptome of unexpanded (just differentiated) PP cells.

      The optimized culture conditions enabled the selection of PDX1+/SOX9+/NKX6.1+ PP cells and their consistent, 2000-fold, expansion over ten passages and 40-45 days. Transcriptome analyses confirmed the stabilization of PP identity and the effective suppression of differentiation. These optimized culture conditions consisted of substituting the Vitamin A containing B27 supplement with a B27 formulation devoid of vitamin A (to avoid retinoic acid (RA) signaling from an autocrine feed-forward loop), substituting A38-01 with the ALK5 II inhibitor (ALK5i II) that targets primarily ALK5, supplementation of medium with FGF18 (in addition to FGF2) and the canonical Wnt inhibitor IWR-1, and cell culture on vitronectin-N (VTN-N) as a substrate instead of Matrigel.

      Strengths:

      The strength of this work relies on a clever approach to identify cell culture modifications that allow expansion of PP cells (once differentiated) while maintaining, if not reinforcing, PP cell identity. Along the work, it is emphasized that PP cell identity is associated with the co-expression of PDX1, SOX9, and NKX6.1. The optimized protocol is unique (among the other datasets used in the comparison shown here) in inducing a strong upregulation of GP2, a unique marker of human fetal pancreas progenitors. Importantly GP2+ enriched hPS cell-derived PP cells are more efficiently differentiating into pancreatic endocrine cells (Aghazadeh et al., 2022; Ameri et al., 2017).

      The unlimited expansion of PP cells reported here would allow scaling-up the generation of beta cells, for the cell therapy of diabetes, by eliminating a source of variability derived from the number of differentiation procedures to be carried out when starting at the hPS cell stage each time. The approach presented here would allow the selection of the most optimally differentiated PP cell population for subsequent expansion and storage. Among other conditions optimized, the authors report a role for Vitamin A in activating retinoic acid signaling in an autocrine feed-forward loop, and the supplementation with FGF18 to reinforce FGF2 signaling.

      This is a relevant topic in the field of research, and some of the cell culture conditions reported here for PP expansion might have important implications in cell therapy approaches. Thus, the approach and results presented in this study could be of interest to researchers working in the field of in vitro pancreatic beta cell differentiation from hPSCs. Table S1 and Table S4 are clearly detailed and extremely instrumental to this aim.

      We thank the reviewer for the positive assessment. Below we provide a point-by-point response to general comments and criticisms.

      Weaknesses

      The authors strictly define PP cells as PDX1+/SOX9+/NKX6.1+ cells, and this phenotype was convincingly characterized by immunofluorescence, RT-qPCR, and FACS analysis along the work. However, broadly defined PDX1+/SOX9+/NKX6.1+ could include pancreatic multipotent progenitor cells (MPC, defined as PDX1+/SOX9+/NKX6.1+/PTF1A+ cells) or pancreatic bipotent progenitors (BP, defined as PDX1+/SOX9+/NKX6.1+/PTF1A-) cells. It has been indeed reported that Nkx6.1/Nkx6.2 and Ptf1a function as antagonistic lineage determinants in MPC (Schaffer, A.E. et al. PLoS Genet 9, e1003274, 2013), and that the Nkx6/Ptf1a switch only operates during a critical competence window when progenitors are still multipotent and can be uncoupled from cell differentiation. It would be important to define whether culturing PDX1+/SOX9+/NKX6.1+ PP (as defined in this work) in the best conditions allowing cell expansion is reinforcing either an MPC or BP phenotype. Data from Figure S2A (last paragraph of page 7) suggests that PTF1A expression is decreased in C5 culture conditions, thus more homogeneously keeping BP cells in this media composition. However, on page 15, 2nd paragraph it is stated that "the strong upregulation of NKX6.2 in our procedure suggested that our ePP cells may have retracted to an earlier PP stage". Evaluating the co-expression of the previously selected markers with PTF1A (or CPA2), or the more homogeneous expression of novel BP markers described, such as DCDC2A (Scavuzzo et al. Nat Commun 9, 3356, 2018), in the different culture conditions assayed would more shield light into this relevant aspect.

      This is certainly an interesting point. The RNA Seq data suggest that ePP cells resemble BP cells rather than MPCs and that this occurs during expansion. We have now added a new paragraph in the results section to illustrate this and added graphs of CPA2, PTF1A and DCDC2A expression during expansion in Figure 5, S5 as well as data in Table S5. In summary, we favor the interpretation that expanded cells are close but not identical to the BP identity and refer to that in the discussion. We have also amended the statement on page 15 stating the strong upregulation of NKX6.2 in our procedure suggested that our ePP cells may have retracted to an earlier PP stage.

      In line with the previous comment, it would be extremely insightful if the authors could characterize or at least discuss a potential role for YAP underlying the mechanistic effects observed after culturing PP in different media compositions. It is well known that the nuclear localization of the co-activator YAP broadly promotes cell proliferation, and it is a key regulator of organ growth during development. Importantly in this context, it has been reported that TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors and disruption of this interaction arrests the growth of the embryonic pancreas (Cebola, I. et al. Nat Cell Biol 17, 615-26, 2015). More recently, it has also been shown that a cell-extrinsic and intrinsic mechanotransduction pathway mediated by YAP acts as gatekeeper in the fate decisions of BP in the developing pancreas, whereby nuclear YAP in BPs allows proliferation in an uncommitted fate, while YAP silencing induces EP commitment (Mamidi, A. et al. Nature 564, 114-118, 2018; Rosado-Olivieri et al. Nature Communications 10, 1464, 2019). This mechanism was further exploited recently to improve the in vitro pancreatic beta cell differentiation protocol (Hogrebe et al., Nature Protocols 16, 4109-4143, 2021; Hogrebe et al, Nature Biotechnology 38, 460-470, 2020). Thus, YAP in the context of the findings described in this work could be a key player underlying the proliferation vs differentiation decisions in PP.

      We do refer to these publications now and refer to the YAP pathway in the introduction and results sections as well as in the discussion. We have not investigated more because the kinetics of the different components of the pathway are complex and do not give an indication of whether the pathway becomes more or less active – please see below.

      Author response image 2.

      Regarding the improvements made in the PP cell culture medium composition to allow expansion while avoiding differentiation, some of the claims should be better discussed and contextualized with current stateof-the-art differentiation protocols. As an example, the use of ALK5 II inhibitor (ALK5i II) has been reported to induce EP commitment from PP, while RA was used to induce PP commitment from the primitive gut tube cell stage in recently reported in vitro differentiation protocols (Hogrebe et al., Nature Protocols 16, 41094143, 2021; Rosado-Olivieri et al. Nature Communications 10, 1464, 2019). In this context, and to the authors' knowledge, is Vitamin A (triggering autocrine RA signaling) usually included in the basal media formulations used in other recently reported state-of-the-art protocols? If so, at which stages? Would it be advisable to remove it?

      These points and our views are now included in the discussion

      In this line also, the supplementation of cell culture media with the canonical Wnt inhibitor IWR-1 is used in this work to allow the expansion of PP while avoiding differentiation. A role for Wnt pathway inhibition during endocrine differentiation using IWR1 has been previously reported (Sharon et al. Cell Reports 27, 22812291.e5, 2019). In that work, Wnt inhibition in vitro causes an increase in the proportion of differentiated endocrine cells. It would be advisable to discuss these previous findings with the results presented in the current work. Could Wnt inhibition have different effects depending on the differential modulation of the other signaling pathways?

      These points are now included in the discussion together with the points above

      Reviewer #3 (Recommendations For The Authors)

      Recommendations for improving the writing and presentation and minor comments on the text and figures:

      • In the Introduction (page 3, line 1) it is stated: "Diabetes is a global epidemic affecting > 9% of the global population and its two main forms result from .....". The authors could rephrase/remove "global" repeated twice.

      Corrected

      • On page 4 of the introduction, in the context of "Unlimited expansion of PP cells in vitro will require disentangling differentiation signals from proliferation/maintenance signals. Several pathways have been implicated in these processes..." the authors are advised to consider mentioning the YAP mediated mechanisms as another key aspect underlying MPC phenotype (Cebola, I. et al. Nat Cell Biol 17, 615-26, 2015) and the BP to endocrine progenitor (EP) commitment (Mamidi, A. et al. Nature 564, 114-118, 2018; Rosado-Olivieri et al. Nature Communications 10, 1464, 2019). This should be better discussed in the context of the Weaknesses mentioned in the Public Review. It would be worth considering adding effectors and other molecules involved in YAP and Hippo pathway signaling to Table S3.

      We have added the role of the Hippo/YAP pathway in the introduction and mentioned in the results the finding that components of the pathway are generally not regulated except two that are now added in Table S3

      • In page 4, paragraph 3, near "and SB431542, another general (ALK4/5/7) TGFβ inhibitor", consider removing "another". SB431542 is the same inhibitor mentioned in the other protocols at the beginning of the paragraph.

      The paragraph is rewritten because it was not clear – we used A83-01 and not SB431542. Other approaches had used SB431542.

      • Page 5, Table S2 is cited after Table S3, please consider reordering.

      In fact, both S2 and S3 are relevant there, therefore we quote both now.

      • Page 8, 2nd paragraph, near "Expression of both AFP and CDX2 increased transiently upon expansion, at p5 (Figure S2H-J)." How do you explain results in FigS2C, D and FigS2E (AFP/CDX2)? RT-qPCR data does not suggest transient downregulation.

      AFP and CDX2 were – wrongly – italicized in the quoted passage. Therefore, in one case we refer to the protein and in the other to the transcript levels. We corrected and added the qualifier ‘appeared’. The difference is most likely due to translational regulation but we did not elaborate since we do not know. In any case, we have used the, less favorable but more robust, gene expression levels as the main criterion.

      • Page 9, end of 2nd paragraph, Figure 5A is cited but it looks like this should be Figure 4A.

      Corrected

      • Page 9, 3rd paragraph, when stating "C5 ePP cells of the same passage no..." please replace "no" with a number or a suitable abbreviation.

      Corrected

      • Page 9, 3rd paragraph. Expressing the values in the Y axis in a consistent manner for FigS2B-D and FigS4A would make a comparison easier.

      We strive to keep sections autonomous so that the reader would not have to flip between figures and sections – this is why we think that figure S4A is preferable as it is; it is a direct comparison of C6 to C5 for the different markers and has the additional advantage that one needs not to include p0 levels.

      • Page 9, 3rd paragraph. Green dots in FigS4A stand for p5 cells? if so, shouldn't these average 1 for all assayed genes?

      No, because the baseline (average 1) is the C5 expression at the corresponding passage no. We changed the y-axis label, hopefully it is clearer now.

      • Page 10 3rd paragraph, please include color labels in Fig. 5G.

      The different colors here correspond to the different expansion procedures that are compared. The samples are labelled on the x axis.

      • Page 10 3rd paragraph, Figure 6G is cited but it looks like this should be Figure 5G.

      Corrected

      • Page 11, 1st paragraph, at "TF genes such as FOXA2 and RBJ remained comparable", please double check if "RBJ" should be "RBPJ".

      Corrected

      • Page 11, end of 1st paragraph, when stating "Of note, expression of PTF1A was also undetectable in all ePP cells (Table S5)", is PTF1A expression level close to 1000 (which units?) in Table S5 considered undetectable?

      This statement regarding ‘undetectable PTF1A expression’ refers to expanded PP cells (ePP), not PP cells at p0. For the latter, expression is indeed close to 1000 in normalized RNA-sequence counts as mentioned in the Table legend.

      -Page 11, 4th paragraph, "In summary, the comparative transcriptome analyses suggested that our C6 expansion procedure is more efficient at strengthening the PP identity". In the context of comments made in the Public Review, more accuracy needs to be put when defining PP identity. Are these MPC or BP?

      The RNA Seq data suggest that expansion promotes a MPC  BP transition. We have added a paragraph in the corresponding results section and comment in the discussion.

      • Page 15, 2nd paragraph, the sentence "expression of PTF1A, recently shown to promote endocrine differentiation of hPS cells (Miguel-Escalada et al., 2022)" is confusing. Please double-check sentence syntax and reference. Does PTF1A expression "promote" or "create epigenetic competence" for endocrine differentiation?

      Its role is in the MPCs and it prepares the epigenetic landscape to allow for duct and endocrine specification later, thus it ‘creates epigenetic competence’. The paper was cited out of context and we have now corrected it.

      Additional recommendations by the Reviewing Editor:

      An insufficient number of experimental repetitions have been used for the following data: (Figure 1A, n = 2; Figures 2B-D, p10, n = 2; Figures 6A and B, VTN-N, n = 1).

      This is true but we do not draw quantitative conclusions from or do comparisons with these data.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their thoughtful evaluation of our manuscript. We considered all the comments and prepared the revised version. The following are our responses to the reviewers’ comments. All references, including those in the original manuscript are included at the end of this point-by-point response.

      Reviewer #1 (Public Review):

      Weaknesses:

      1) The authors should better review what we know of fungal Drosophila microbiota species as well as the ecology of rotting fruit. Are the microbiota species described in this article specific to their location/setting? It would have been interesting to know if similar species can be retrieved in other locations using other decaying fruits. The term 'core' in the title suggests that these species are generally found associated with Drosophila but this is not demonstrated. The paper is written in a way that implies the microbiota members they have found are universal. What is the evidence for this? Have the fungal species described in this paper been found in other studies? Even if this is not the case, the paper is interesting, but there should be a discussion of how generalizable the findings are.

      The reviewer inquires as to whether the microbial species described in this article are ubiquitously associated with Drosophila or not. Indeed, most of the microbes described in this manuscript are generally recognized as species associated with Drosophila spp. For example, yeasts such as Hanseniaspora uvarum, Pichia kluyveri, and Starmerella bacillaris have been detected in or isolated from Drosophila spp. collected in European countries as well as the United States and Oceania (Chandler et al., 2012; Solomon et al., 2019). As for bacteria, species belonging to the genera Pantoea, Lactobacillus, Leuconostoc, and Acetobacter have also previously been detected in wild Drosophila spp. (Chandler et al., 2011). These statements have been incorporated into our revised manuscript (lines 391-397). Nevertheless, the term “core” in the manuscript and title may lead to misunderstanding, as the generality does not ensure the ubiquitous presence of these microbial species in every individual fly. Considering this point, we replaced the “core” with “key,” a term that is more appropriate to our context.

      2) Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild? Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild?

      The reviewer asked whether the microbial species detected from the fermented banana samples were derived from flies. To address this question, additional experiments under more controlled conditions would be needed, such as artificially introducing wild flies onto fresh bananas in the laboratory. Nevertheless, the microbes potentially originate from wild flies, as supported by the literature cited in our response to the Weakness 1).

      Alternative sources of microbes also merit consideration. For example, microbes may have been introduced to unfermented bananas by penetration through peel injuries (lines 1300-1301). In addition, they could be introduced by insects other than flies, given that rove beetles (Staphylinidae) and sap beetles (Nitidulidae) were observed in some of the traps. The explanation of these possibilities have been incorporated into DISCUSSION (lines 414427) of our revised manuscript.

      Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Our sampling strategy was designed to target not only D. melanogaster but also other domestic Drosophila species, such as D. simulans, that inhabit human residential areas. For the traps where adult flies were caught, we identified the species of the drosophilids as shown in Table S1, thereby showing the presence of either or both D. melanogaster and D. simulans. We added these descriptions in MATERIALS AND METHODS (lines 511-512 and 560-562), and DISCUSSION (lines 378-379).

      3) Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning. The authors described their microarray data in terms of fed/starved in relation to the Finke article. They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning.

      Regarding the antimicrobial peptide genes, statistical comparisons of our RNA-seq data across different conditions were impracticable because most of the genes showed low expression levels. The RNA-seq data of the yeast-fed larvae is shown in Author response Table 1. While a subset of genes exhibited significantly elevated expression in the nonsupportive conditions relative to the supportive ones, this can be due to intra-sample variability rather than the difference in the nutritional conditions. Similar expression profiles were observed in the bacteria-fed larvae as well (data not shown). Therefore, it is difficult to discuss a change in immune genes in the paper. Additionally, the previous study that conducted larval microarray analysis (Zinke et al., 2002) did not explicitly focus on immune genes.

      Author response table 1.

      Antimicrobial peptide genes are not up-regulated by any of the microbes. Antimicrobial peptides gene expression profiles of whole bodies of first-instar larvae fed on yeasts. TPM values of all samples and comparison results of gene expression levels in the larvae fed on supportive and non-supportive yeasts are shown. Antibacterial peptide genes mentioned in Hanson and Lemaitre, 2020 are listed. NA or na, not available.

      They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      We did not observe significant differences in the gene expression profiles of the larvae fed on different microbial species within bacteria or fungi, or between those fed on bacteria and those fed on fungi. For example, the gene expression profiles of larvae fed on the various supportive microbes showed striking similarities to each other, as evidenced by the heat map showing the expression of all genes detected in larvae fed either yeast or bacteria (Author response image 1). Similarities were also observed among larvae fed on various nonsupportive microbes.

      Only a handful of genes showed different expression patterns between larvae fed on yeast and those fed on bacteria. Thus, it is challenging to discuss the potential differential impacts of yeast and bacteria on larval growth, if any.

      Author response image 1.

      Gene expression profiles of larvae fed on the various supporting microbes show striking similarities to each other. Heat map showing the gene expression of the first-instar larvae that fed on yeasts or bacteria. Freshly hatched germ-free larvae were placed on banana agar inoculated with each microbe and collected after 15 h feeding to examine gene expression of the whole body. Note that data presented in Figures 3A and 4C in the original manuscript, which are obtained independently, are combined to generate this heat map. The labels under the heat map indicate the microbial species fed to the larvae, with three samples analyzed for each condition. The lactic acid bacteria (“LAB”) include Lactiplantibacillus plantarum and Leuconostoc mesenteroides, while the lactic acid bacterium (“AAB”) represents Acetobacter orientalis. “LAB + AAB” signifies mixtures of the AAB and either one of the LAB species. The asterisks in the label highlight “LAB + AAB” or “LAB” samples clustered separately from the other samples in those conditions; “” indicates a sample in a “LAB + AAB” condition (Lactiplantibacillus plantarum + Acetobacter orientalis), and “*” indicates a sample in a “LAB” condition (Leuconostoc mesenteroides). Brown abbreviations of scientific names are for the yeast-fed conditions. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; M. asi, Martiniozyma asiatica; Sa. cra, Saccharomycopsis crataegensis; P. klu, Pichia kluyveri; St. bac, Starmerella bacillaris; BY4741, Saccharomyces cerevisiae BY4741 strain.

      4) The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)? Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)?

      Although we did not investigate the microbiota in the gut of either larvae or adults, we did compare the microbiota within surface-sterilized larvae or adults with the microbiota in food samples. We found that adult flies and early-stage foods, as well as larvae and late-stage foods, harbored similar microbial species (Figure 1F). Additionally, previous studies examining the gut microbiota in wild adult flies have detected microbes belonging to the same species or taxa as those isolated from our foods (Chandler et al., 2011; Chandler et al., 2012). We have elaborated on this in our response to Weakness 1).

      While we did not investigate whether these species are capable of establishing a niche in the cardia of adults, we have cited the study by Dodge et al., 2023 in our revised manuscript and discussed the possibility that predominant microbes in adult flies may show a propensity for colonization (lines 410-413).

      Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The reviewer inquires whether the supportive microbes in our study stimulate gut signaling pathways and induce the expression of digestive protease genes, as demonstrated in a previous study (Erkosar et al., 2015). Based on our RNA-seq data, this is unlikely. The aforementioned study demonstrated that seven protease genes are upregulated through Imd pathway stimulation by a bacterium that promotes the larval growth. In our RNA-seq analysis, these seven genes did not exhibit a consistent upregulation in the presence of the supportive microbes (H. uva or K. hum in Author response table 2A; Le. mes + A. ori in Author response table 2B). Rather, they exhibited a tendency to be upregulated by the presence of non-supportive microbes (St. bac or Pi. klu in Author response table 2A; La. pla in Author Response Table 2B).

      Author response table 2.

      Most of the peptidase genes reported by Erkosar et al., 2015 are more highly expressed under the non-supportive conditions than the supportive conditions. Comparison of the expression levels of seven peptidase genes derived from the RNA-seq analysis of yeast-fed (A) or bacteria-fed (B) first-instar larvae. A previous report demonstrated that the expression of these genes is upregulated upon association with a strain of Lactiplantibacillus plantarum, and that the PGRP-LE/Imd/Relish signaling pathway, at least partially, mediates the induction (Erkosar et al., 2015). H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; La. pla, Lactiplantibacillus plantarum; Le. mes, Leuconostoc mesenteroides; A. ori, Acetobacter orientalis; ns, not significant.

      Reviewer #2 (Public Review):

      Weaknesses:

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas. Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation. Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas.

      The reviewer asks whether the isolated microbes were colonized in the larval gut. Previous studies on microbial colonization associated with Drosophila have predominantly focused on adults (Pais et al. PLOS Biology, 2018), rather than larval stages. Developing larvae continually consume substrates which are already subjected to microbial fermentation and abundant in live microbes until the end of the feeding larval stage. Therefore, we consider it difficult to discuss microbial colonization in the larval gut. We have mentioned this point in DISCUSSION of the revised manuscript (lines 408-410).

      Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation.

      While we recognize the importance of comprehensive mechanistic analysis, elucidation of more detailed molecular mechanisms lies beyond the scope of this study and will be a subject of future research.

      Regarding the nutritional role of BCAAs, the incorporation of BCAAs enabled larvae fed with the non-supportive yeast to grow to the second-instar stage. This observation implies that consumption of BCAAs upregulates diverse genes involved in cellular growth processes in larvae. We mentioned a previously reported interaction between lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in the manuscript (lines 433-436). LAB may facilitate lactate provision to AAB, consequently enhancing the biosynthesis of essential nutrients such as amino acids. To test this hypothesis, future experiments will include the supplementation of lactic acid to AAB culture plates, and the co-inoculation of AAB with LAB mutant strains defective in lactate production to assess both larval growth and continuous larval association with AAB. With respect to AAB-yeast interactions, metabolites released from yeast cells might benefit AAB growth, and this possibility will be investigated through the supplementation of AAB culture plates with candidate metabolites identified in the cell suspension supernatants of the late-stage yeasts.

      Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      We appreciate the reviewer's recommendations. The explanation of the universality of our findings has been included in the revised DISCUSSION (lines 391-397). We have also added descriptions on the implication of compositional shifts occurring in adult microbiota (lines 404413), possible inoculation routes of different microbes (lines 414-427), and hypotheses on the mechanism of larval growth promotion by yeasts (lines 469-476), all of which could be the focus of our future study.

      Reviewer #3 (Public Review):

      Weaknesses:

      Despite describing important findings, I believe that a more thorough explanation of the experimental setup and the steps expected to occur in the exposed diet over time, starting with natural "inoculation" could help the reader, in particular the non-specialist, grasp the rationale and main findings of the manuscript. When exactly was the decision to collect earlystage samples made? Was it when embryos were detected in some of the samples? What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects? Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source. Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples?

      We collected traps and early-stage samples 2.5 days after setting up the traps. This duration was determined from pilot experiments. A shorter collection time resulted in a lower likelihood of obtaining traps visited by adult flies, whereas a longer collection time caused overcrowding of larvae as well as deaths of adults from drowning in the liquid seeping out of the fruits. These procedural details have been included in the MATERIALS AND METHODS section of the revised manuscript (lines 523-526).

      What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects?

      We assume that the origins of the microbes detected in the no-fly trap foods vary depending on the species. For instance, Colletotrichum musae, the fungus that causes banana anthracnose, may have been present in fresh bananas before trap placement. The filamentous fungi could have originated from airborne spores, but they could also have been introduced by insects that feed on these fungi. We have included these possibilities in the DISCUSSION section of the revised manuscript (lines 417-421).

      Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source.

      We are grateful for the reviewer's insightful suggestion regarding shifts in the adult microbiome. We have included in the DISCUSSION section of the revised manuscript the possibility that the microbial composition may change substantially during pupal stages or after adult eclosion (lines 404-413).

      Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used?

      In this metabolomic analysis, LC-MS/MS with triple quadrupole MS monitors the formation of fragment ions from precursor ions specific to each target compound. The use of PFPP columns, which provide excellent separation of amino acids and nucleobases, allows chromatographic peaks of many structural isomers to be separated into independent peaks. In addition, all measured compounds are compared with data from a standard library to confirm retention time agreement. Structural isomers were separated either by retention time on the column or by compound-specific MRM signals (in fact, leucine and isoleucine have both unique MRM channels and column separations). Detailed MRM conditions are identical to the previously published study (Oka et al., 2017). These have been included in the revised ‘LC-MS/MS measurement’ section in MATERIALS AND METHODS (lines 810-824).

      Were standard curves produced?

      Since relative quantification of metabolite amounts was performed in this study, no standard curve was generated to determine absolute concentrations. However, a standard compound of known concentration (single point) was measured to confirm retention time and relative area values.

      Were internal, deuterated controls used?

      Internal standards for deuterium-labeled compounds were not used in this study. This is because it is not realistic to obtain deuterium-labeled compounds for all compounds since a large number of compounds are measured. However, an internal standard (L-methionine sulfone) is added to the extraction solvent to calculate the recovery rate. This has been included in the revised ‘LC-MS/MS measurement’ section in MATERIALS AND METHODS (lines 824-825).

      Reviewer #1 (Recommendations For The Authors):

      Additional comments 1. The authors should do a better job of presenting their data. It took me quite a while to understand the protocol of Figure 1. Panel 1A, B, C could be improved. For instance, 1A suggests that flies are transferred to the lab while this is in fact the banana trap. Indicate 'Banana trap colonized by flies' rather 'wild-type flies in the trap'. 1C: should indicate that the food suspension comes from the banana trap. 1B,D,D: do not use pale color as legend. Avoid the use of indices in Figure 2 (Y1 rather than Y1). Grey colors are difficult to distinguish in Figure 2. Etc. It is a pain for reviewers that figure legends are on the verso of each figure and not just below.

      We thank the reviewer for the detailed suggestions to improve the clarity and comprehensibility of our figures. We have improved the figures according to the suggestions. As for the figure legends, we have placed them below each respective figure whenever possible.

      1. Clarify in the text if 'sample' means food substratum or flies/larvae (ex. line 116 and elsewhere).

      We have revised the word “sample” throughout our manuscript and eliminated the confusion.

      1. Line 170 - clarify what you mean by fermented food.

      We have replaced the “fermented larval foods” with “fermented bananas” in our revised manuscript (line 165).

      1. Line 199 - what is the meaning of 'stocks'.

      We have replaced the “stocks” with “strains” (line 195).

      1. Line 320 - explain more clearly what the yeast-conditioned banana-agar plate and cell suspension supernatant are, and what the goals of using these media are. This will help in understanding the subsequent text.

      We have added a supplemental figure illustrating the sample preparation for the metabolomic analysis (Figure S6), with the following legend describing the procedure (lines 1335-1346): “Sample preparation process for the metabolomic analysis. We suspected that the supportive live yeast cells may release critical nutrients for larval growth, whereas the non-supportive yeasts may not. To test this possibility, we made three distinct sample preparations of individual yeast strains (yeast cells, yeast-conditioned banana-agar plates, and cell suspension supernatants). Yeast cells were for the analysis of intracellular metabolites, whereas yeast-conditioned banana-agar plates and cell suspension supernatants were for that of extracellular metabolites. The samples were prepared as the following procedures. Yeasts were grown on banana-agar plates for 2 days at 25°C, and then scraped from the plates to obtain “yeast cells.” Next, the remaining yeasts on the resultant plates were thoroughly removed, and a portion from each plate was cut out (“yeast-conditioned banana agar”). In addition, we suspended yeast cells from the agar plates into sterile PBS, followed by centrifugation and filtration to eliminate the yeast cells, to prepare “cell suspension supernatants.”

      1. Figure 5 is difficult to understand. Provide more explanation. Consider moving the 'all metabolites panel' to Supp. Better explain what this holidic medium is.

      The holidic medium is a medium that has been commonly used in the Drosophila research community, which contains ~40 known nutrients, and supports the larval development to pupariation (Piper et al., 2014; Piper et al., 2017). We have introduced this explanation to the RESULTS section of the manuscript (lines 322-327). However, the scope of our research reaches beyond the analysis of the holidic medium components, because feeding the holidic medium alone causes a significant delay in larval growth, suggesting a lack of nutritional components (Piper et al., 2014). Thus, we believe the "All Metabolites" panels should be placed alongside the corresponding “The holidic medium components” panels.

      1. I could not access Figure 6 when downloading the PDF. The page is white and an error message appears - it is problematic to review a paper lacking a figure.

      We regret any inconvenience caused, perhaps due to a system error. Please refer to the Author response image 2, which is identical to Figure 6 of our original manuscript.

      Author response image 2.

      Supportive yeasts facilitate larval growth by providing nutrients, including branched-chain amino acids, by releasing them from their cells (Figure 6 from the original manuscript). (A and B) Growth of larvae feeding on yeasts on banana agar supplemented with leucine and isoleucine. (A) The mean percentage of the live/dead individuals in each developmental stage. n=4. (B) The percentage of larvae that developed into second instar or later stages. The “Not found” population in Figure 6A was omitted from the calculation. Each data point represents data from a single tube. Unique letters indicate significant differences between groups (Tukey-Kramer test, p < 0.05). (C) The biosynthetic pathways for leucine and isoleucine with S. cerevisiae gene names are shown. The colored dots indicate enzymes that are conserved in the six isolated species, while the white dots indicate those that are not conserved. Abbreviations of genera are given in the key in the upper right corner. LEU2 is deleted in BY4741. (D-G) Representative image of Phloxine B-stained yeasts. The right-side images are expanded images of the boxed areas. The scale bar represents 50 µm. (H) Summary of this study. H. uvarum is predominant in the early-stage food and provides Leu, Ile, and other nutrients that are required for larval growth. In the late-stage food, AAB directly provides nutrients, while LAB and yeasts indirectly contribute to larval growth by enabling the stable larva-AAB association. The host larva responds to the nutritional environment by dramatically altering gene expression profiles, which leads to growth and pupariation. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; Pi. klu, Pichia kluyveri; St. bac, Starmerella bacillaris; GF, germ-free.

      1. Line 323 - Consider rewriting this sentence (too long, explain what the holidic medium is and why this is interesting). "In the yeast-conditioned banana-agar plates, which were anticipated to contain yeast-derived nutrients, many well-known nutrients included in a chemically defined synthetic (holidic) medium for Drosophila melanogaster (Piper et al., 2014, 2017) were not increased compared to the sterile banana-agar plates; instead, they exhibited drastic decreases irrespective of the yeast species."

      We thank the reviewer's suggestion to improve the readability of our manuscript. We have rewritten the sentence in the revised manuscript (lines 320-328) as follows: “The yeastconditioned banana-agar plates were expected to contain yeast-derived nutrients. On the contrary, the result revealed a depletion of various metabolites originally present in the sterile banana agar (Figure 5A). This result prompted us to focus on the metabolites in the chemically defined (holidic) medium for Drosophila melanogaster Piper et al., 2014; Piper et al., 2017. This medium contains ~40 known nutrients, and supports the larval development to pupariation, albeit at the half rate compared to that on a yeast-containing standard laboratory food Piper et al., 2014; Piper et al., 2017. Therefore, the holidic medium could be considered to contain the minimal essential nutrients required for larval growth. Our analysis indicated a substantial reduction of these known nutrients in the yeast-conditioned plates compared to their original quantities (Figure 5B).”

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      1. It should be clearly shown (or stated) that isolated microbes, such as H. uvarum and Pa. agglomerans, are indigenous microbes in wild Drosophila melanogaster in their outdoor sampling.

      We thank the reviewer for the suggestions. Addressing the presence of isolated microbes within wild D. melanogaster adults is important, but cannot be feasible with our data for the following reasons. Our microbiota analysis of adults was conducted using pooled individuals of multiple Drosophila species, rather than using D. melanogaster exclusively. Moreover, the microbial isolation and the analysis of adult microbiota were carried out in two independent samplings (Figures 1A and 1E in the original manuscript, respectively). As a result, the microbial species detected in the adults were slightly different from those isolated from the food samples collected in the previous sampling. Nevertheless, it is worth noting that H. uvarum dominated in 2 out of the 3 adult samples, constituting >80% of the fungal composition. Pantoea agglomerans was not detected in the adults, although Enterobacterales accounted for >59% in 2 out of the 3 samples. Therefore, these isolated microbial species, or at least their phylogenetically related species, are presumed to be indigenous to wild D. melanogaster.

      If the reviewer’s suggestion was to state the dominance of H. uvarum and Pantoea agglomerans in early-stage foods, we have added a supplemental figure showing the species-level microbial compositions corresponding to Figure 1B of the original manuscript (Figure S1), and further revised the manuscript (lines 180-186).

      1. The reviewer supposes that the indigenous microbes of flies may differ from what they usually eat. In this study, the authors use banana-based food, but is it justified in terms of the natural environment of the places where those microbes were isolated? In other words, did sampled wild flies eat bananas outside the laboratory at Kyoto University?

      Drosophila spp. inhabit human residential areas and feed on various fermented fruits and vegetables. In the areas surrounding Kyoto University, they can be found in garbage in residential dwellings as well as supermarkets. In this regard, fruits are natural food sources of wild Drosophila in the area.

      Among various fruits, bananas were selected based on the following two reasons. Firstly, bananas were commonly used in previous Drosophila studies as a trap bait or a component of Drosophila food (Anagnostou et al., 2010; Stamps et al., 2012; Consuegra et al., 2020). Secondly, and rather practically, bananas can be obtained in Japan all year at a relatively low cost. Previous studies have used various fruits such as grapes (Quan and Eisen, 2018), figs (Pais et al., 2018), and raspberries (Cho and Rohlfs, 2023). However, these fruits are only available during limited seasons and are more expensive per volume than bananas. Thus, they were not practical for our study, which required large amounts of fruit-based culture media. We have included a brief explanation regarding this point in MATERIALS AND METHODS (lines 514-518).

      1. In Fig. 6B, the Leu and Ile experiment, is the added amount of those amino acids appropriate in the context that they mention "...... supportive yeasts had concentrations of both leucine and isoleucine that were at least four-fold higher than those of non-supportive yeasts"?

      We acknowledge that the supplementation should be carried out ideally in a quantity equivalent to the difference between the released amounts of supportive and non-supportive species. However, achieving this has been highly challenging. Previous studies determined the amount of amino acid supplementation by quantifying their concentration in the bacteriaconditioned media (Consuegra et al., 2020; Henriques et al., 2020). However, we found that quantifying the exact concentrations of the amino acids is not feasible with our yeasts. As shown in Figure 5B in the original manuscript, the amino acid contents were markedly reduced in the yeast-conditioned banana agar compared to the agar without yeasts, presumably because of the uptake by the yeasts. Thus, the amino acids released from yeast cells on the banana-agar plate are not expected to accumulate in the medium. As this reviewer pointed out, in the cell suspension supernatants of the supportive yeasts, concentrations of both leucine and isoleucine were at least four-fold higher compared to those of non-supportive yeasts (Figures 5G-H in the original submission), However, this measurement does not give the absolute amount of either amino acid available for larvae. Given these constraints, we opted for the amino acid concentrations in the holidic medium, which support larval growth under axenic conditions (Piper et al., 2014). We also showed that the supplementation of the amino acids at that concentration to the bananaagar plate was not detrimental to larval growth (Figures 6A-B in the original manuscript). These rationales have been included in the revised ‘Developmental progression with BCAA supplementation’ section in MATERIALS AND METHODS of our manuscript (lines 840-847).

      1. In addition to the above, it can be included other amino acids or nutrients as control experiments.

      As mentioned in our manuscript (lines 365-368), we did supplement other amino acids, lysine and asparagine, which failed to rescue the larval growth.

      1. In the experiment of Fig. 2E, how about examining larval development using heat-killed LAB or yeast with live AAB? The reviewer speculates that one possibility is that AAB needs nutrients from LAB.

      We did not feed larvae with heat-killed LAB and live AAB for the following reasons. LAB grows very poorly on banana agar compared to yeasts, and preparation of LAB required many banana-agar plates even when we fed live bacteria to larvae. Adding dead LAB to banana-agar tubes would require far more plates, but this preparation is impractical. Furthermore, heat-killing may not allow the investigation of the contribution of heat-unstable or volatile compounds.

      As for the reviewer's suggestion regarding the addition of heat-killed yeast with AAB, heat-killed yeast itself promotes larval growth, as shown in Figures 4G and 4H in the original manuscript, so the contribution of yeast cannot be examined using this method.

      Recommendations for improving the writing and presentation.

      1. It would be good to mention that during sample collection, other insects (other than Drosophila species) were not found in the food if this is true.

      Insects other than Drosophila spp. were found in several traps in the sampling shown in Figures 1C-F. These insects, rove beetles (Staphylinidae) and sap beetles (Nitidulidae), seemed to share a niche with Drosophila in nature. Therefore, we believe that the contamination of these insects did not interfere with our goal of obtaining larval food samples. We added these descriptions and explanations to MATERIALS AND METHODS (lines 527531).

      1. There are many different kinds of bananas. It should be mentioned the detailed information.

      We had included the information on the banana in MATERIALS AND METHODS section (line 622).

      1. Concerning the place of sample collection, detailed longitude, and latitude information can be provided (this is easily obtained from Google Maps). When the collection was performed should also be mentioned. This may suggest the environment of the "wild flies" they collected.

      We added a table listing the dates of our collections, along with the longitude and latitude of each sampling place (Table S1A).

      1. The reviewer could not find how the authors conducted heat killing of yeast.

      We added the following procedure to the ‘Quantification of larval development’ section in MATERIALS AND METHODS (lines 680-688). “When feeding heat-killed yeasts to larvae, yeasts were added to the banana-agar tubes and subsequently heated as following procedures. The yeasts were revived from frozen stocks on banana-agar plates, incubated at 25°C, and then streaked on fresh agar plates. After 2-day incubation, yeast cells were scraped from the plates and suspended in PBS at the concentration of 400 mg of yeast cells in 500 µL of PBS. 125 µL of the suspensions were added to banana-agar tubes prepared as described, and after centrifugation at 3,000 x g for 5 min, the supernatants were removed. The amount of cells in each tube is ~50x compared to that when feeding live yeasts, which compensates for the reduced amount due to their inability to proliferate. The tubes were subsequently heated at 80°C for 30 min before adding germ-free larvae.”

      1. The reviewer prefers that all necessary information on how to see figures be provided in figure legends. For example, an explanation of some abbreviations is missing.

      We carefully re-examined the figure legends and added necessary information.

      1. Many of the figures are not kind to readers, i.e., one needs to refer to the legends and main text very frequently. Adding subheadings (titles) to each figure may help.

      We added subheadings to our figures to improve the comprehensibility.

      Reviewer #3 (Recommendations For The Authors):

      I have some minor questions/suggestions about the manuscript that, if addressed, may increase the clarity and quality of the work.

      1. Please, when referring to microbial species in the abbreviated form, use only the first letter of the genus. For example, P. agglomerans should be used, not Pa. agglomerans.

      We are concerned about the potential confusion caused by using only the first letter of genera, since several genera mentioned in our work share the first letters, such as P (Pichia and Pantoea), S (Starmerella, Saccharomyces, and Saccharomycopsis), or L (Lactiplantibacillus and Leuconostoc). Therefore, we used only the unabbreviated form of the above seven genera in our revised manuscript. We have also made every effort to avoid abbreviations in our figures and tables, but found it necessary to retain two-letter abbreviations when spaces are particularly limiting.

      1. In lines 294-298, how exactly was the experiment where yeasts were killed by anti-fungal agents performed? If these agents killed the yeast, how was the microbial growth on plates required to have biomass for fly inoculation obtained? Please, clarify this section.

      The yeasts were grown on normal banana-agar plates before the addition onto the anti-fungal agents-containing banana agar. We added the following procedure to MATERIALS AND METHODS (lines 689-695). “When feeding yeasts on banana agar supplemented with antifungal agents, the yeasts were individually grown on normal banana agar twice before being suspended in PBS at the concentration of 400 mg of yeast cells in 500 µL of PBS. 125 µL of the suspensions was introduced onto the anti-fungal agents (10 mL/L 10% p-hydroxybenzoic acid in 70% ethanol and 6 mL/L propionic acid, following the concentration described in Kanaoka et al., 2023)-containing banana agar in 1.5 mL tubes. After centrifugation, the supernatants were removed. The amount of cells in each tube is ~50x compared to that when feeding live yeasts.”

      1. In lines 557-558, please clarify how rDNA copy numbers can be calculated in this way.

      Considering the results of the ITS and 16S sequencing analysis, it was highly likely that rDNAs from bananas and Drosophila were amplified along with microbial rDNA in this qPCR. To estimate the microbial rDNA copy number, we assumed that the proportion of microbial rDNA within the total amplification products remains consistent between the qPCR and the corresponding sequencing analysis, because the template DNA samples and amplified regions were shared between the analyses. Based on this, the copy number of microbial rDNA was estimated by multiplying the qPCR results with the microbial rDNA ratio observed in the ITS or 16S sequencing analysis of each sample. This methodology has been detailed in the MATERIALS AND METHODS section (lines 609-615).

      1. In lines 609-611, how did you check for cells left from the previous day? Microscopy? Or do you mean that if there was liquid still in the sample you would not add more bacterial cultures? Please, clarify.

      We observed with the naked eye from outside the tubes to determine if additional AAB should be introduced. Since we placed AAB on the banana agar in a lump, we examined whether the lumps were gone or not. We have added these procedures in MATERIALS AND METHODS (lines 671-673).

      1. In Figure 2A, it is hard to differentiate between the gray tones. Please, improve this.

      We have distinguished the plots for different conditions by changing the shape of the markers on the graphs.

      1. In the legend of Figure 4, line 1101, I believe the panel letters are incorrect.

      We have corrected the manuscript (lines 1241-1242) from “heat-killed yeasts on banana agar (H and I) or live yeasts on a nutritionally rich medium (J and K)” to “heat-killed yeasts on banana agar (G and H) or live yeasts on a nutritionally rich medium (I and J).”

      1. In Figure S1, authors showed that bananas that were not inoculated still had detectable rDNA signal. Is this really because bacteria can penetrate the peel? Or could this be the “reagent microbiome”? Alternatively, could these microbes have been introduced during sample prep, such as cutting the bananas?

      The detection of rDNA in bananas that were not inoculated with microbes was unlikely to be due to microbial contamination during experimental manipulation. The reviewer pointed out the possibility that the “reagent microbiome”, presumably the microbes in PBS, are detected from the uninoculated bananas. This seems to be unlikely, considering the PBS was sterilized by autoclaving before use. To ensure that no viable microbe was left in the autoclaved PBS, we applied a portion of the PBS onto a banana-agar plate and confirmed no colony was formed after incubation for a few days. DNA derived from dead microbes might be present in the PBS, but the PBS-added bananas were incubated for 4 days, so it is also unlikely that a detectable amount of DNA remained until sample collection. Furthermore, we believe that no contamination occurred during sample preparation. Banana peels were treated with 70% ethanol before removing them extremely carefully to avoid touching the fruit inside. All tools were sterilized before use. Taking all of these into account, we speculate that the microbes were already present in the bananas before peeling. We added the details of the sample preparation processes in MATERIALS AND METHODS (lines 518-521 and 540).

      Other major revisions

      1. We deposited our yeast genome annotation data in the DDBJ Annotated/Assembled Sequences database, and the accession numbers have been added to the ‘Data availability’ section in MATERIALS AND METHODS (lines 868-873).

      2. The bacterial composition data in Figure 1B was corrected, because in the original version, the data for Place 3 and Place 4 was plotted in reverse. The original and revised plots are shown side by side in Author response image 3. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p5, lines 117-120).

      Author response image 3.

      Comparison of the original and revised version of bacterial composition graph in Figure 1B. Comparison of the original (left) and revised (right) version of the graph at the bottom of Figure 1B, which shows the result of bacterial composition analysis. The color key, which is unmodified, is placed below the revised version.

      1. The plot data and labels in the RNA-seq result heatmaps (Figures 3A and 4C) have been corrected. In these figures, row Z-scores of log2(TPM + 1) were to be plotted, as indicated by the key in each figure. However, in the original version, row Z-scores of TPM was erroneously plotted. Thus, Figures 3A and 4C of the original version have been replaced with the correct plots, and the original and revised plots are shown side by side in Author response images 4A and 4B. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p7, lines 222-226 and p9, lines 277-281).

      Author response image 4.

      Comparison of the original and revised version of Figures 3A and 4C. (A and B) Comparison of the original (left) and revised (right) version of Figures 3A (A) or 4C (B).

      1. The keys in the original Figures 3D and 4F indicate that log2(fold change) was used to plot all data. However, when plotting the data from the previous study (Zinke et al., 2002), their “fold change value” was used. We have corrected the keys, plots, and legend of Figure 3D to reflect the different nature of the data from our RNA-seq analysis and those from microarray analysis by Zinke et al. The original and revised plots are shown side by side in Author response image 5. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p7, lines 228230 and p9, 277-284).

      Author response image 5.

      Comparison of the original and revised version of Figures 3D and 4F. (A and B) Comparison of the original (left) and revised (right) version of Figures 3D (A) or 4F (B).

      1. The labels in Figure S5C and S5D (Figure S4C and S4D in the original version) have been corrected (they are "Pichia kluyveri > Supportive" and "Starmerella bacillaris > Supportive" rather than "Non-support. > H. uva" and "Non-support. > K. hum"). Additionally, we have reintroduced the circle indicating the number of “dme04070: Phosphatidylinositol signaling system” DEGs in Figure S5D, which was missing in Figure S4D of the original version. The original and revised figures are shown in Author response image 6.

      Author response image 6.

      Comparison of the original and revised version of Figures S5C and S5D. (A and B) Comparison of the original (left) and revised (right) versions of Figures S5C (A) or S5D (B). The original figures corresponding to the aforementioned figures were Figures S4C and S4D, respectively.

      1. The "Fermentation stage" column in Table 1, which indicated whether each microbe was considered an early-stage microbe or a late-stage microbe, has been removed to avoid confusion. This is because some of the microbes (Hanseniaspora uvarum, Pichia kluyveri, and Pantoea agglomerans) were employed in both of the feeding experiments using the microbes detected from the early-stage foods (Figures 2A, 2B, S2A, and S2B) and those from the late-stage foods (Figures 2C, 2D, S2C, and S2D).

      2. The leftmost column in Table S7 has been edited to indicate species names rather than “Sample IDs,” because the IDs were not used in anywhere else in the paper.

      Reference

      Chandler, J. A., Lang, J., Bhatnagar, S., Eisen, J. A. and Kopp, A. (2011). Bacterial communities of diverse Drosophila species: Ecological context of a host-microbe model system. PLoS Genetics 7, e1002272.

      Chandler, J. A., Eisen, J. A. and Kopp, A. (2012). Yeast communities of diverse Drosophila species: Comparison of two symbiont groups in the same hosts. Applied and Environmental Microbiology 78, 7327–7336.

      Cho, H. and Rohlfs, M. (2023). Transmission of beneficial yeasts accompanies offspring production in Drosophila—An initial evolutionary stage of insect maternal care through manipulation of microbial load? Ecology and Evolution 13, e10184.

      Consuegra, J., Grenier, T., Akherraz, H., Rahioui, I., Gervais, H., da Silva, P. and Leulier, F. (2020). Metabolic Cooperation among Commensal Bacteria Supports Drosophila Juvenile Growth under Nutritional Stress. iScience 23, 101232.

      Dodge, R., Jones, E. W., Zhu, H., Obadia, B., Martinez, D. J., Wang, C., Aranda-Díaz, A., Aumiller, K., Liu, Z., Voltolini, M., et al. (2023). A symbiotic physical niche in Drosophila melanogaster regulates stable association of a multi-species gut microbiota. Nat Commun 14, 1557.

      Erkosar, B., Storelli, G., Mitchell, M., Bozonnet, L., Bozonnet, N. and Leulier, F. (2015). Pathogen Virulence Impedes Mutualist-Mediated Enhancement of Host Juvenile Growth via Inhibition of Protein Digestion. Cell Host & Microbe 18, 445–455.

      Hanson, M. A. and Lemaitre, B. (2020). New insights on Drosophila antimicrobial peptide function in host defense and beyond. Current Opinion in Immunology 62, 22–30.

      Henriques, S. F., Dhakan, D. B., Serra, L., Francisco, A. P., Carvalho-Santos, Z., Baltazar, C., Elias, A. P., Anjos, M., Zhang, T., Maddocks, O. D. K., et al. (2020). Metabolic cross-feeding in imbalanced diets allows gut microbes to improve reproduction and alter host behaviour. Nat Commun 11, 4236.

      Oka, M., Hashimoto, K., Yamaguchi, Y., Saitoh, S., Sugiura, Y., Motoi, Y., Honda, K., Kikko, Y., Ohata, S., Suematsu, M., et al. (2017). Arl8b is required for lysosomal degradation of maternal proteins in the visceral yolk sac endoderm of mouse embryos. Journal of Cell Science jcs.200519.

      Pais, I. S., Valente, R. S., Sporniak, M. and Teixeira, L. (2018). Drosophila melanogaster establishes a species-specific mutualistic interaction with stable gut-colonizing bacteria. PLOS Biology 16, e2005710.

      Piper, M. D. W., Blanc, E., Leitão-Gonçalves, R., Yang, M., He, X., Linford, N. J., Hoddinott, M. P., Hopfen, C., Soultoukis, G. A., Niemeyer, C., et al. (2014). A holidic medium for Drosophila melanogaster. Nature Methods 11, 100–105.

      Piper, M. D. W., Soultoukis, G. A., Blanc, E., Mesaros, A., Herbert, S. L., Juricic, P., He, X., Atanassov, I., Salmonowicz, H., Yang, M., et al. (2017). Matching Dietary Amino Acid Balance to the In Silico-Translated Exome Optimizes Growth and Reproduction without Cost to Lifespan. Cell Metab 25, 610–621.

      Quan, A. S. and Eisen, M. B. (2018). The ecology of the drosophila-yeast mutualism in wineries. PLOS ONE 13, e0196440.

      Solomon, G. M., Dodangoda, H., McCarthy-Walker, T. T., Ntim-Gyakari, R. R. and Newell, P. D. (2019). The microbiota of Drosophila suzukii influences the larval development of Drosophila melanogaster. PeerJ 7, e8097.

      Zinke, I., Schütz, C. S., Katzenberger, J. D., Bauer, M. and Pankratz, M. J. (2002). Nutrient control of gene expression in Drosophila: microarray analysis of starvation and sugar-dependent response. The EMBO Journal 21, 6162–6173.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their thoughtful comments and constructive suggestions. Point-by-point responses to comments are given below:

      Reviewer #1 (Recommendations For The Authors):

      This manuscript provides an important case study for in-depth research on the adaptability of vertebrates in deep-sea environments. Through analysis of the genomic data of the hadal snailfish, the authors found that this species may have entered and fully adapted to extreme environments only in the last few million years. Additionally, the study revealed the adaptive features of hadal snailfish in terms of perceptions, circadian rhythms and metabolisms, and the role of ferritin in high-hydrostatic pressure adaptation. Besides, the reads mapping method used to identify events such as gene loss and duplication avoids false positives caused by genome assembly and annotation. This ensures the reliability of the results presented in this manuscript. Overall, these findings provide important clues for a better understanding of deep-sea ecosystems and vertebrate evolution.

      Reply: Thank you very much for your positive comments and encouragement.

      However, there are some issues that need to be further addressed.

      1. L119: Please indicate the source of any data used.

      Reply: Thank you very much for the suggestion. All data sources used are indicated in Supplementary file 1.

      1. L138: The demographic history of hadal snailfish suggests a significant expansion in population size over the last 60,000 years, but the results only show some species, do the results for all individuals support this conclusion?

      Reply: Thank you for this suggestion. The estimated demographic history of the hadal snailfish reveals a significant population increase over the past 60,000 years for all individuals. The corresponding results have been incorporated into Figure 1-figure supplements 8B.

      Author response image 1.

      (B) Demographic history for 5 hadal snailfish individuals and 2 Tanaka’s snailfish individuals inferred by PSMC. The generation time of one year for Tanaka snailfish and three years for hadal snailfish.

      1. Figure 1-figure supplements 8: Is there a clear source of evidence for the generation time of 1 year chosen for the PSMC analysis?

      Reply: We apologize for the inclusion of an incorrect generation time in Figure 1-figure supplements 8. It is important to note that different generation times do not change the shape of the PSMC curve, they only shift the curve along the axis. Due to the absence of definitive evidence regarding the generation time of the hadal snailfish, we have referred to Wang et al., 2019, assuming a generation time of one year for Tanaka snailfish and three years for hadal snailfish. The generation time has been incorporated into the main text (lines 516-517): “The generation time of one year for Tanaka snailfish and three years for hadal snailfish.”.

      1. L237: Transcriptomic data suggest that the greatest changes in the brain of hadal snailfish compared to Tanaka's snailfish, what functions these changes are specifically associated with, and how these functions relate to deep-sea adaptation.

      Reply: Thank you for this suggestion. Through comparative transcriptome analysis, we identified 3,587 up-regulated genes and 3,433 down-regulated genes in the brains of hadal snailfish compared to Tanaka's snailfish. Subsequently, we conducted Gene Ontology (GO) functional enrichment analysis on the differentially expressed genes, revealing that the up-regulated genes were primarily associated with cilium, DNA repair, protein binding, ATP binding, and microtubule-based movement. Conversely, the down-regulated genes were associated with membranes, GTP-binding, proton transmembrane transport, and synaptic vesicles, as shown in following table (Supplementary file 15). Previous studies have shown that high hydrostatic pressure induces DNA strand breaks and damage, and that DNA repair-related genes upregulated in the brain may help hadal snailfish overcome these challenges.

      Author response table 1.

      GO enrichment of expression up-regulated and down-regulated genes in hadal snailfish brain.

      We have added new results (Supplementary file 15) and descriptions to show the changes in the brains of hadal snailfish (lines 250-255): “Specifically, there are 3,587 up-regulated genes and 3,433 down-regulated genes in the brain of hadal snailfish compared to Tanaka snailfish, and Gene Ontology (GO) functional enrichment analyses revealed that up-regulated genes in the hadal snailfish are associated with cilium, DNA repair, and microtubule-based movement, while down-regulated genes are enriched in membranes, GTP-binding, proton transmembrane transport, and synaptic vesicles (Supplementary file 15).”

      1. L276: What is the relationship between low bone mineralization and deep-sea adaptation, and can low mineralization help deep-sea fish better adapt to the deep sea?

      Reply: Thank you for this suggestion. The hadal snailfish exhibits lower bone mineralization compared to Tanaka's snailfish, which may have facilitated its adaptation to the deep sea. On one hand, this reduced bone mineralization could have contributed to the hadal snailfish's ability to maintain neutral buoyancy without excessive energy expenditure. On the other hand, the lower bone mineralization may have also rendered their skeleton more flexible and malleable, enhancing their resilience to high hydrostatic pressure. Accordingly, we added the following new descriptions (lines 295-300): “Nonetheless, micro-CT scans have revealed shorter bones and reduced bone density in hadal snailfish, from which it has been inferred that this species has reduced bone mineralization (M. E. Gerringer et al., 2021); this may be a result of lowering density by reducing bone mineralization, allowing to maintain neutral buoyancy without expending too much energy, or it may be a result of making its skeleton more flexible and malleable, which is able to better withstand the effects of HHP.”

      1. L293: The abbreviation HHP was mentioned earlier in the article and does not need to be abbreviated here.

      Reply: Thank you for the correction. We have corrected the word. Line 315.

      1. L345: It should be "In addition, the phylogenetic relationships between different individuals clearly indicate that they have successfully spread to different trenches about 1.0 Mya".

      Reply: Thank you for the correction. We have corrected the word. Line 374.

      1. It is curious what functions are associated with the up-regulated and down-regulated genes in all tissues of hadal snailfish compared to Tanaka's snailfish, and what functions have hadal snailfish lost in order to adapt to the deep sea?

      Reply: Thank you for this suggestion. We added a description of this finding in the results section (lines 337-343): “Next, we identified 34 genes that are significantly more highly expressed in all organs of hadal snailfish in comparison to Tanaka’s snailfish and zebrafish, while only seven genes were found to be significantly more highly expressed in Tanaka’s snailfish using the same criterion (Figure 5-figure supplements 1). The 34 genes are enriched in only one GO category, GO:0000077: DNA damage checkpoint (Adjusted P-value: 0.0177). Moreover, five of the 34 genes are associated with DNA repair.” This suggests that up-regulated genes in all tissues in hadal snailfish are associated with DNA repair in response to DNA damage caused by high hydrostatic pressure, whereas down-regulated genes do not show enrichment for a particular function.

      Overall, the functions lost in hadal snailfish adapted to the deep sea are mainly related to the effects of the dark environment, which can be summarized as follows (lines 375-383): “The comparative genomic analysis revealed that the complete absence of light had a profound effect on the hadal snailfish. In addition to the substantial loss of visual genes and loss of pigmentation, many rhythm-related genes were also absent, although some rhythm genes were still present. The gene loss may not only come from relaxation of natural selection, but also for better adaptation. For example, the grpr gene copies are absent or down-regulated in hadal snailfish, which could in turn increased their activity in the dark, allowing them to survive better in the dark environment (Wada et al., 1997). The loss of gpr27 may also increase the ability of lipid metabolism, which is essential for coping with short-term food deficiencies (Nath et al., 2020).”

      Reviewer #2 (Recommendations For The Authors):

      I have pointed out some of the examples that struck me as worthy of additional thought/writing/comments from the authors. Any changes/comments are relatively minor.

      Reply: Thank you very much for your positive comments on this work.

      For comparative transcriptome analyses, reads were mapped back to reference genomes and TPM values were obtained for gene-level count analyses. 1:1 orthologs were used for differential expression analyses. This is indeed the only way to normalize counts across species, by comparing the same gene set in each species. Differential expression statistics were run in DEseq2. This is a robust way to compare gene expression across species and where fold-change values are reported (e.g. Fig 3, creatively by coloring the gene name) the values are best-practice.

      In other places, TPM values are reported (e.g. Fig 2D, Fig 4C, Fig 5A, Fig 4-Fig supp 4) to illustrate expression differences within a tissue across species. The comparisons look robust, although it is not made clear how the values were obtained in all cases. For example, in Fig 2D the TPM values appear to be from eyes of individual fish, but in Fig 4C and 5A they must be some kind of average? I think that information should be added to the figure legends.

      Of note: TPM values are sensitive to the shape of the RNA abundance distribution from a given sample: A small number of very highly expressed genes might bias TPM values downward for other genes. From one individual to another or from one species to another, it is not obvious to me that we should expect the same TPM distribution from the same tissues, making it a challenging metric for comparison across samples, and especially across species. An alternative measure of RNA abundance is normalized counts that can be output from DEseq2. See:

      Zhao, Y., Li, M.C., Konaté, M.M., Chen, L., Das, B., Karlovich, C., Williams, P.M., Evrard, Y.A., Doroshow, J.H. and McShane, L.M., 2021. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository. Journal of translational medicine, 19(1), pp.1-15.

      If the authors would like to keep the TPM values, I think it would be useful for them to visualize the TPM value distribution that the numbers were derived from. One way to do this would be to make a violin plot for species/tissue and plot the TPM values of interest on that. That would give a visualization of the ranked value of the gene within the context of all other TPM values. A more highly expressed gene would presumably have a higher rank in context of the specific tissue/species and be more towards the upper tail of the distribution. An example violin plot can be found in Fig 6 of:

      Burns, J.A., Gruber, D.F., Gaffney, J.P., Sparks, J.S. and Brugler, M.R., 2022. Transcriptomics of a Greenlandic Snailfish Reveals Exceptionally High Expression of Antifreeze Protein Transcripts. Evolutionary Bioinformatics, 18, p.11769343221118347.

      Alternatively, a comparison of TPM and normalized count data (heatmaps?) would be of use for at least some of the reported TPM values to show whether the different normalization methods give comparable outputs in terms of differential expression. One reason for these questions is that DEseq2 uses normalized counts for statistical analyses, but values are expressed as TPM in the noted figures (yes, TPM accounts for transcript length, but can still be subject to distribution biases).

      Reply: Thank you for your suggestions. Following your suggestions, we modified Fig 2D, Fig 4C, Fig 4-Fig supp 4, and Fig 5-Fig supp 1, respectively. In the differential expression analyses, only one-to-one orthologues of hadal snailfish and Tanaka's snailfish can get the normalized counts output by DEseq2, so we showed the normalized counts by DEseq2 output for Fig 2D, Fig 4C, Fig 4-Fig supp 4, Fig 5-Fig supp 1, and for Fig 5A, since the copy number of fthl27 genes undergoes specific expansion in hadal snailfish, we visualized the ranking of all fthl27 genes across tissues by plotting violins in Fig 5-Fig supp 2.

      Author response image 2.

      (D) Log10-transformation normalized counts for DESeq2 (COUNTDESEQ2) of vision-related genes in the eyes of hadal snailfish and Tanka's snailfish. * represents genes significantly downregulated in hadal snailfish (corrected P < 0.05).

      Author response image 3.

      (C) The deletion of one copy of grpr and another copy of down-regulated expression in hadal snailfish. The relative positions of genes on chromosomes are indicated by arrows, with arrows to the right representing the forward strand and arrows to the left representing the reverse strand. The heatmap presented is the average of the normalized counts for DESeq2 (COUNTDESEQ2) in all replicate samples from each tissue. * represents tissue in which the grpr-1 was significantly down-regulated in hadal snailfish (corrected P < 0.05).

      Author response image 4.

      Expression of the vitamin D related genes in various tissues of hadal snailfish and Tanaka's snailfish. The heatmap presented is the average of the normalized counts for DESeq2 (COUNTDESEQ2) in all replicate samples from each tissue.

      Author response image 5.

      (B) Expression of the ROS-related genes in different tissues of hadal snailfish and Tanaka's snailfish. The heatmap presented is the average of the normalized counts for DESeq2 (COUNTDESEQ2) in all replicate samples from each tissue.

      Author response image 6.

      Ranking of the expression of individual copies of fthl27 gene in hadal snailfish and Tanaka's snailfish in various tissues showed that all copies of fthl27 in hadal snailfish have high expression. The gene expression presented is the average of TPM in all replicate samples from each tissue.

      Line 96: Which BUSCOs? In the methods it is noted that the actinopterygii_odb10 BUSCO set was used. I think it should also be noted here so that it is clear which BUSCO set was used for completeness analysis. It could even be informally the ray-finned fish BUSCOs or Actinopterygii BUSCOs.

      Reply: Thank you for this suggestion. We used Actinopterygii_odb10 database and we added the BUSCO set to the main text as follows (lines 92-95): “The new assembly filled 1.26 Mb of gaps that were present in our previous assembly and have a much higher level of genome continuity and completeness (with complete BUSCOs of 96.0 % [Actinopterygii_odb10 database]) than the two previous assemblies.”

      Lines 102-105: The medaka genome paper proposes the notion that the ancestral chromosome number between medaka, tetraodon, and zebrafish is 24. There may be other evidence of that too. Some of that evidence should be cited here to support the notion that sticklebacks had chromosome fusions to get to 21 chromosomes rather than scorpionfish having chromosome fissions to get to 24. Here's the medaka genome paper:

      Kasahara, M., Naruse, K., Sasaki, S., Nakatani, Y., Qu, W., Ahsan, B., Yamada, T., Nagayasu, Y., Doi, K., Kasai, Y. and Jindo, T., 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature, 447(7145), pp.714-719.

      Reply: Thank you for your great suggestion. Accordingly, we modified the sentence and added the citation as follows (lines 100-105): “We noticed that there is no major chromosomal rearrangement between hadal snailfish and Tanaka’s snailfish, and chromosome numbers are consistent with the previously reported MTZ-ancestor (the last common ancestor of medaka, Tetraodon, and zebrafish) (Kasahara et al., 2007), while the stickleback had undergone several independent chromosomal fusion events (Figure 1-figure supplements 4).”

      Line 161-173: "Along with the expression data, we noticed that these genes exhibit a different level of relaxation of natural selection in hadal snailfish (Figure 2B; Figure 2-figure supplements 1)." With the above statment and evidence, the authors are presumably referring to gene losses and differences in expression levels. I think that since gene expression was not measured in a controlled way it may not be a good measure of selection throughout. The reported genes could be highly expressed under some other condition, selection intact. I find Fig2-Fig supp 1 difficult to interpret. I assume I am looking for regions where Tanaka’s snailfish reads map and Hadal snailfish reads do not, but it is not abundantly clear. Also, other measures of selection might be good to investigate: accumulation of mutations in the region could be evidence of relaxed selection, for example, where essential genes will accumulate fewer mutations than conditional genes or (presumably) genes that are not needed at all. The authors could complete a mutational/SNP analysis using their genome data on the discussed genes if they want to strengthen their case for relaxed selection. Here is a reference (from Arabidopsis) showing these kinds of effects:

      Monroe, J.G., Srikant, T., Carbonell-Bejerano, P., Becker, C., Lensink, M., Exposito-Alonso, M., Klein, M., Hildebrandt, J., Neumann, M., Kliebenstein, D. and Weng, M.L., 2022. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature, 602(7895), pp.101-105.

      Reply: Thank you for pointing out this important issue. Following your suggestion, we have removed the mention of the down-regulation of some visual genes in the eyes of hadal snailfish and the results of the original Fig2-Fig supp 1 that were based on reads mapping to confirm whether the genes were lost or not. To investigate the potential relaxation of natural selection in the opn1sw2 gene in hadal snailfish, we conducted precise gene structure annotation. Our findings revealed that the opn1sw2 gene is pseudogenized in hadal snailfish, indicating a relaxation of natural selection. We have included this result in Figure 2-figure supplements 1.

      Author response image 7.

      Pseudogenization of opn1sw2 in hadal snailfish. The deletion changed the protein’s sequence, causing its premature termination.

      Accordingly, we have toned down the related conclusions in the main text as follows (lines 164-173): “We noticed that the lws gene (long wavelength) has been completely lost in both hadal snailfish and Tanaka’s snailfish; rh2 (central wavelength) has been specifically lost in hadal snailfish (Figure 2B and 2C); sws2 (short wavelength) has undergone pseudogenization in hadal snailfish (Figure 2-figure supplements 1); while rh1 and gnat1 (perception of very dim light) is both still present and expressed in the eyes of hadal snailfish (Figure 2D). A previous study has also proven the existence of rhodopsin protein in the eyes of hadal snailfish using proteome data (Yan, Lian, Lan, Qian, & He, 2021). The preservation and expression of genes for the perception of very dim light suggests that they are still subject to natural selection, at least in the recent past.”

      Line 161-170: What tissue were the transcripts derived from for looking at expression level of opsins? Eyes?

      Reply: Thank you for your suggestions. The transcripts used to observe the expression levels of optic proteins were obtained from the eye.

      Line 191: What does tmc1 do specifically?

      Reply: Thank you for this suggestion. The tmc1 gene encodes transmembrane channel-like protein 1, involved in the mechanotransduction process in sensory hair cells of the inner ear that facilitates the conversion of mechanical stimuli into electrical signals used for hearing and homeostasis. We added functional annotations for the tmc1 in the main text (lines 190-196): “Of these, the most significant upregulated gene is tmc1, which encodes transmembrane channel-like protein 1, involved in the mechanotransduction process in sensory hair cells of the inner ear that facilitates the conversion of mechanical stimuli into electrical signals used for hearing and homeostasis (Maeda et al., 2014), and some mutations in this gene have been found to be associated with hearing loss (Kitajiri, Makishima, Friedman, & Griffith, 2007; Riahi et al., 2014).”

      Line 208: "it is likely" is a bit proscriptive

      Reply: Thank you for this suggestion. We rephrased the sentence as follows (lines 213-215): “Expansion of cldnj was observed in all resequenced individuals of the hadal snailfish (Supplementary file 10), which provides an explanation for the hadal snailfish breaks the depth limitation on calcium carbonate deposition and becomes one of the few species of teleost in hadal zone.”

      Line 199: maybe give a little more info on exactly what cldnj does? e.g. "cldnj encodes a claudin protein that has a role in tight junctions through calcium independent cell-adhesion activity" or something like that.

      Reply: Thank you for this suggestion. We have added functional annotations for the cldnj to the main text (lines 200-204): “Moreover, the gene involved in lifelong otolith mineralization, cldnj, has three copies in hadal snailfish, but only one copy in other teleost species, encodes a claudin protein that has a role in tight junctions through calcium independent cell-adhesion activity (Figure 3B, Figure 3C) (Hardison, Lichten, Banerjee-Basu, Becker, & Burgess, 2005).”

      Lines 199-210: Paragraph on cldnj: there are extra cldnj genes in the hadal snailfish, but no apparent extra expression. Could the authors mention that in their analysis/discussion of the data?

      Reply: Thank you for your suggestions. Despite not observing significant changes in cldnj expression in the brain tissue of hadal snailfish compared to Tanaka's snailfish, it is important to consider that the brain may not be the primary site of cldnj expression. Previous studies in zebrafish have consistently shown expression of cldnj in the otocyst during the critical early growth phase of the otolith, with a lower level of expression observed in the zebrafish brain. However, due to the unavailability of otocyst samples from hadal snailfish in our current study, our findings do not provide confirmation of any additional expression changes resulting from cldnj amplification. Consequently, it is crucial to conduct future comprehensive investigations to explore the expression patterns of cldnj specifically in the otocyst of hadal snailfish. Accordingly, we added a discussion of this result in the main text (lines 209-214): “In our investigation, we found that the expression of cldnj was not significantly up-regulated in the brain of the hadal snailfish than in Tanaka’s snailfish, which may be related to the fact that cldnj is mainly expressed in the otocyst, while the expression in the brain is lower. However, due to the immense challenge in obtaining samples of hadal snailfish, the expression of cldnj in the otocyst deserves more in-depth study in the future.”

      Lines 225-231: I wonder whether low expression of a circadian gene might be a time of day effect rather than an evolutionary trait. Could the authors comment?

      Reply: Thank you for your suggestions. Previous studies have shown that the grpr gene is expressed relatively consistently in mouse suprachiasmatic nucleus (SCN) throughout the day (Figure 4-figure supplements 1) and we hypothesize that the low expression of grpr-1 gene expression in hadal snailfish is an evolutionary trait. We have modified this result in the main text (lines 232-242): “In addition, in the teleosts closely related to hadal snailfish, there are usually two copies of grpr encoding the gastrin-releasing peptide receptor; we noticed that in hadal snailfish one of them is absent and the other is barely expressed in brain (Figure 4C), whereas a previous study found that the grpr gene in the mouse suprachiasmatic nucleus (SCN) did not fluctuate significantly during a 24-hour light/dark cycle and had a relatively stable expression (Pembroke, Babbs, Davies, Ponting, & Oliver, 2015) (Figure 4-figure supplements 1). It has been reported that grpr deficient mice, while exhibiting normal circadian rhythms, show significantly increased locomotor activity in dark conditions (Wada et al., 1997; Zhao et al., 2023). We might therefore speculate that the absence of that gene might in some way benefit the activity of hadal snailfish under complete darkness.”

      Author response image 8.

      (B) Expression of the grpr in a 24-hour light/dark cycle in the mouse suprachiasmatic nucleus (SCN). Data source with http://www.wgpembroke.com/shiny/SCNseq.

      Line 253: What is gpr27? G protein coupled receptor?

      Reply: We apologize for the ambiguous description. Gpr27 is a G protein-coupled receptor, belonging to the family of cell surface receptors. We introduced gpr27 in the main text as follows (lines 270-273): “Gpr27 is a G protein-coupled receptor, belonging to the family of cell surface receptors, involved in various physiological processes and expressed in multiple tissues including the brain, heart, kidney, and immune system.”

      Line 253: Fig4 Fig supp 3 is a good example of pseudogenization!

      Reply: Thank you very much for your recognition.

      Line 279: What is bglap? It regulates bone mineralization, but what specifically does that gene do?

      Reply: We apologize for the ambiguous description. The bglap gene encodes a highly abundant bone protein secreted by osteoblasts that binds calcium and hydroxyapatite and regulates bone remodeling and energy metabolism. We introduced bglap in the main text as follows (lines 300-304): “The gene bglap, which encodes a highly abundant bone protein secreted by osteoblasts that binds calcium and hydroxyapatite and regulates bone remodeling and energy metabolism, had been found to be a pseudogene in hadal fish (K. Wang et al., 2019), which may contribute to this phenotype.”

      Line 299: Introduction of another gene without providing an exact function: acaa1.

      Reply: We apologize for the ambiguous description. The acaa1 gene encodes acetyl-CoA acetyltransferase 1, a key regulator of fatty acid β-oxidation in the peroxisome, which plays a controlling role in fatty acid elongation and degradation. We introduced acaa1 in the main text as follows (lines 319-324): “In regard to the effect of cell membrane fluidity, relevant genetic alterations had been identified in previous studies, i.e., the amplification of acaa1 (encoding acetyl-CoA acetyltransferase 1, a key regulator of fatty acid β-oxidation in the peroxisome, which plays a controlling role in fatty acid elongation and degradation) may increase the ability to synthesize unsaturated fatty acids (Fang et al., 2000; K. Wang et al., 2019).”

      Fig 5 legend: The DCFH-DA experiment is not an immunofluorescence assay. It is better described as a redox-sensitive fluorescent probe. Please take note throughout.

      Reply: Thank you for pointing out our mistakes. We corrected the word. Line 1048 and 1151 as follows: “ROS levels were confirmed by redox-sensitive fluorescent probe using DCFH-DA molecular probe in 293T cell culture medium with or without fthl27-overexpression plasmid added with H2O2 or FAC for 4 hours.”

      Line 326: Manuscript notes that ROS levels in transfected cells are "significantly lower" than the control group, but there is no quantification or statistical analysis of ROS levels. In the methods, I noticed the mention of flow cytometry, but do not see any data from that experiment. Proportion of cells with DCFH-DA fluorescence above a threshold would be a good statistic for the experiment... Another could be average fluorescence per cell. Figure 5B shows some images with green dots and it looks like more green in the "control" (which could better be labeled as "mock-transfection") than in the fthl27 overexpression, but this could certainly be quantified by flow cytometry. I recommend that data be added.

      Reply: Thank you for your suggestions. We apologize for the error in the main text, we used a fluorescence microscope to observe fluorescence in our experiments, not a flow cytometer. We have corrected it in the methods section as follows (lines 651-653): “ROS levels were measured using a DCFH-DA molecular probe, and fluorescence was observed through a fluorescence microscope with an optional FITC filter, with the background removed to observe changes in fluorescence.” Meanwhile, we processed the images with ImageJ to obtain the respective mean fluorescence intensities (MFI) and found that the MFI of the fthl27-overexpression cells were lower than the control group, which indicated that the ROS levels of the fthl27-overexpression cells were significantly lower than the control group. MFI has been added to Figure 5B.

      Author response image 9.

      ROS levels were confirmed by redox-sensitive fluorescent probe using DCFH-DA molecular probe in 293T cell culture medium with or without fthl27-overexpression plasmid added with H2O2 or FAC for 4 hours. Images are merged from bright field images with fluorescent images using ImageJ, while the mean fluorescence intensity (MFI) is also calculated using ImageJ. Green, cellular ROS. Scale bars equal 100 μm.

      Regarding the ROS experiment: Transfection of HEK293T cells should be reasonably straightforward, and the experiment was controlled appropriately with a mock transfection, but some additional parameters are still needed to help interpret the results. Those include: Direct evidence that the transfection worked, like qPCR, western blots (is the fthl27 tagged with an antigen?), coexpression of a fluorescent protein. Then transfection efficiency should be calculated and reported.

      Reply: Thank you for your suggestions. To assess the success of the transfection, we randomly selected a subset of fthl27-transfected HEK293T cells for transcriptome sequencing. This approach allowed us to examine the gene expression profiles and confirm the efficacy of the transfection process. As control samples, we obtained transcriptome data from two untreated HEK293T cells (SRR24835259 and SRR24835265) from NCBI. Subsequently, we extracted the fthl27 gene sequence of the hadal snailfish, along with 1,000 bp upstream and downstream regions, as a separate scaffold. This scaffold was then merged with the human genome to assess the expression levels of each gene in the three transcriptome datasets. The results demonstrated that the fthl27 gene exhibited the highest expression in fthl27-transfected HEK293T cells, while in the control group, the expression of the fthl27 gene was negligible (TPM = 0). Additionally, the expression patterns of other highly expressed genes were similar to those observed in the control group, confirming the successful fthl27 transfection. These findings have been incorporated into Figure 5-figure supplements 3.

      Author response image 10.

      (B) Reads depth of fthl27 gene in fthl27-transfected HEK293T cells and 2 untreated HEK293T cells (SRR24835259 and SRR24835265) transcriptome data. (C) Expression of each gene in the transcriptome data of fthl27-transfected HEK293T cells and 2 untreated HEK293T cells (SRR24835259 and SRR24835265), where the genes shown are the 4 most highly expressed genes in each sample.

      Lines 383-386: expression of DNA repair genes is mentioned, but not shown anywhere in the results?

      Reply: Thank you for your suggestions. Accordingly, we added a description of this finding in the results section (lines 337-343): “Next, we identified 34 genes that are significantly more highly expressed in all organs of hadal snailfish in comparison to Tanaka’s snailfish and zebrafish, while only seven genes were found to be significantly more highly expressed in Tanaka’s snailfish using the same criterion (Figure 5-figure supplements 1). The 34 genes are enriched in only one GO category, GO:0000077: DNA damage checkpoint (Adjusted P-value: 0.0177). Moreover, five of the 34 genes are associated with DNA repair.”. And we added the information in the Figure 5-figure supplements 1C.

      Author response image 11.

      (C) Genes were significantly more highly expressed in all tissues of the hadal snailfish compared to Tanaka's snailfish, and 5 genes (purple) were associated with DNA repair.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #3 comment

      1) One suggestion for improvement is to consider incorporating the results from Figure S9 into in the main Figure 6, which would enhance readers' comprehension.

      We appreciate your valuable feedback. Based on the reviewer’s suggestion, we have incorporated results from the Figure S9 into the main Figure 6, as shown below. Manuscripts and figure legends have also been modified accordingly.

      Author response image 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly per the following comments:

      1. Pro1153Leu is extremely common in the general population (allele frequency in gnomAD is 0.5). Further discussion is warranted to justify the possibility that this variant contributes to a phenotype documented in 1.5-3% of the population. Is it possible that this variant is tagging other rare SNPs in the COL11A1 locus, and could any of the existing exome sequencing data be mined for rare nonsynonymous variants?

      One possible avenue for future work is to return to any existing exome sequencing data to query for rare variants at the COL11A1 locus. This should be possible for the USA MO case-control cohort. Any rare nonsynonymous variants identified should then be subjected to mutational burden testing, ideally after functional testing to diminish any noise introduced by rare benign variants in both cases and controls. If there is a significant association of rare variation in AIS cases, then they should consider returning to the other cohorts for targeted COL11A1 gene sequencing or whole exome sequencing (whichever approach is easier/less expensive) to demonstrate replication of the association.

      Response: Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Table below. Two of them (NM_080629.2:c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a GlyX-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We did conduct pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18), but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      1. COL11A1 p.Pro1335Leu is pursued as a direct candidate susceptibility locus, but the functional validation involves both: (a) a complementation assay in mouse GPCs, Figure 5; and (b) cultured rib cartilage cells from Col11a1-Ad5 Cre mice (Figure 4). Please address the following:

      2A. Is Pro1335Leu a loss of function, gain of function, or dominant negative variant? Further rationale for modeling this change in a Col11a1 loss of function cell line would be helpful.

      Response: Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

      2B. Expression appears to be augmented compared WT in Fig 5B, but there is no direct comparison of WT with variant.

      Response: Expression of the mutant (from the lentiviral expression vector) is increased compared to mutant. We observed this effect in repeated experiments. Sequencing confirmed that the mutant and wildtype constructs differed only at the position of the rs3753841 SNP. At this time, we cannot explain the difference in expression levels. Nonetheless, even when the variant COL11A1 is relatively overexpressed it fails to suppress MMP3 expression as observed for the wildtype form.

      2C. How do the authors know that their complementation data in Figure 5 are specific? Repetition of this experiment with an alternative common nonsynonymous variant in COL11A1 (such as rs1676486) would be helpful as a comparison with the expectation that it would be similar to WT.

      Response: We agree that testing an allelic series throughout COL11A1 could be informative, but we have shifted our resources toward in vivo experiments that we believe will ultimately be more informative for deciphering the mechanistic role of COL11A1 in MMP3 regulation and spine deformity.

      2D. The y-axes of histograms in panel A need attention and clarification. What is meant by power? Do you mean fold change?

      Response: Power is directly comparable to fold change but allows comparison of absolute expression levels between different genes.

      2E. Figure 5: how many technical and biological replicates? Confirm that these are stated throughout the figures.

      Response: Thank you for pointing out this oversight. This information has been added throughout.

      1. Figure 2: What does the gross anatomy of the IVD look like? Could the authors address this by showing an H&E of an adjacent section of the Fig. 2 A panels?

      Response: Panel 2 shows H&E staining. Perhaps the reviewer is referring to the WT and Pax1 KO images in Figure 3? We have now added H&E staining of WT and Pax1 KO IVD as supplemental Figure 3E to clarify the IVD anatomy.

      1. Page 9: "Cells within the IVD were negative for Pax1 staining ..." There seems to be specific PAX1 expression in many cells within the IVD, which is concerning if this is indeed a supposed null allele of Pax1. This data seems to support that the allele is not null.

      Response: We have now added updated images for the COL11A1 and PAX1 staining to include negative controls in which we omitted primary antibodies. As can be seen, there is faint autofluorescence in the PAX1 negative control that appears to explain the “specific staining” referred to by the reviewer. These images confirm that the allele is truly a null.

      1. There is currently a lack of evidence supporting the claim that "Col11a1 is positively regulated by Pax1 in mouse spine and tail". Therefore, it is necessary to conduct further research to determine the direct regulatory role of Pax1 on Col11a1.

      Response: We agree with the reviewer and have clarified that Pax1 may have either a direct or indirect role in Col11a1 regulation.

      1. There is no data linking loss of COL11A1 function and spine defects in the mouse model. Furthermore, due to the absence of P1335L point mutant mice, it cannot be confirmed whether P1335L can actually cause AIS, and the pathogenicity of this mutation cannot be directly verified. These limitations need to be clearly stated and discussed. A Col11a1 mouse mutant called chondroysplasia (cho), was shown to be perinatal lethal with severe endochondral defects (https://pubmed.ncbi.nlm.nih.gov/4100752/). This information may help contextualize this study.

      Response: We partially agree with the reviewer. Spine defects are reported in the cho mouse (for example, please see reference 36 Hafez et al). We appreciate the suggestion to cite the original Seegmiller et al 1971 reference and have added it to the manuscript.

      1. A recent article (PMID37462524) reported mutations in COL11A2 associated with AIS and functionally tested in zebrafish. That study should be cited and discussed as it is directly relevant for this manuscript.

      Response: We agree with the reviewer that this study provides important information supporting loss of function I type XI collagen in spinal deformity. Language to this effect has been added to the manuscript and this study is now cited in the paper.

      1. Please reconcile the following result on page 10 of the results: "Interestingly, the AISassociated gene Adgrg6 was amongst the most significantly dysregulated genes in the RNA-seq analysis (Figure 3c). By qRT-PCR analysis, expression of Col11a1, Adgrg6, and Sox6 were significantly reduced in female and male Pax1-/- mice compared to wild-type mice (Figure 3d-g)." In Figure 3f, the downregulation of Adgrg6 appears to be modest so how can it possibly be highlighted as one of the most significantly downregulated transcripts in the RNAseq data?

      Response: By “significant” we were referring to the P-value significance in RNAseq analysis, not in absolute change in expression. This language was clearly confusing, and we have removed it from the manuscript.

      1. It is incorrect to refer to the primary cell culture work as growth plate chondrocytes (GPCs), instead, these are primary costal chondrocyte cultures. These primary cultures have a mixture of chondrocytes at differing levels of differentiation, which may change differentiation status during the culturing on plastic. In sum, these cells are at best chondrocytes, and not specifically growth plate chondrocytes. This needs to be corrected in the abstract and throughout the manuscript. Moreover, on page 11 these cells are referred to as costal cartilage, which is confusing to the reader.

      Response: Thank you for pointing out these inconsistencies. We have changed the manuscript to say “costal chondrocytes” throughout.

      Minor points

      • On 10 of the Results: "These data support a mechanistic link between Pax1 and Col11a1, and the AIS-associated genes Gpr126 and Sox6, in affected tissue of the developing tail." qRT-PCR validation of Sox6, although significant, appears to be very modestly downregulated in KO. Please soften this statement in the text.

      Response: We have softened this statement.

      • Have you got any information about how the immortalized (SV40) costal cartilage affected chondrogenic differentiation? The expression of SV40 seemed to stimulate Mmp13 expression. Do these cells still make cartilage nodules? Some feedback on this process and how it affects the nature of the culture what be appreciated.

      Response: The “+ or –“ in Figure 5 refers to Ad5-cre. Each experiment was performed in SV40-immortalized costal chondrocytes. We have removed SV40 from the figure and have clarified the legend to say “qRT-PCR of human COL11A1 and endogenous mouse Mmp3 in SV40 immortalized mouse costal chondrocytes transduced with the lentiviral vector only (lanes 1,2), human WT COL11A1 (lane 3), or COL11A1P1335L. Otherwise we absolutely agree that understanding Mmp13 regulation during chondrocyte differentiation is important. We plan to study this using in vivo systems.

      • Figure 1: is the average Odds ratio, can this be stated in the figure legend?

      Response: We are not sure what is being asked here. The “combined odds ratio” is calculated as a weighted average of the log of the odds.

      • A more consistent use of established nomenclature for mouse versus human genes and proteins is needed.

      Human:GENE/PROTEIN

      Mouse: Gene/PROTEIN

      Response: Thank you for pointing this out. The nomenclature has been corrected throughtout the manuscript.

      • There is no Figure 5c, but a reference to results in the main text. Please reconcile. -There is no Figure 5-figure supplement 5a, but there is a reference to it in the main text. Please reconcile.

      Response: Figure references have been corrected.

      • Please indicate dilutions of all antibodies used when listed in the methods.

      Response: Antibody dilutions have been added where missing.

      • On page 25, there is a partial sentence missing information in the Histologic methods; "#S36964 Invitrogen, CA, USA)). All images were taken..."

      Response: We apologize for the error. It has been removed.

      • Table 1: please define all acronyms, including cohort names.

      Response: We apologize for the oversight. The legend to the Table has been updated with definitions of all acronyms.

      • Figure 2: Indicate that blue staining is DAPI in panel B. Clarify that "-ab" as an abbreviation is primary antibody negative.

      Response: A color code for DAPI and COL11A! staining has been added and “-ab” is now defined.

      • Page 4: ADGRG6 (also known as GPR126)...the authors set this up for ADGRG6 but then use GPR126 in the manuscript, which is confusing. For clarity, please use the gene name Adgrg6 consistently, rather than alternating with Gpr126.

      Response: Thank you for pointing this out. GPR126 has now been changed to ADGRG6 thoughout the manuscript.

      • REF 4: Richards, B.S., Sucato, D.J., Johnston C.E. Scoliosis, (Elsevier, 2020). Is this a book, can you provide more clarity in the Reference listing?

      Response: Thank you for pointing this out. This reference has been corrected.

      • While isolation was addressed, the methods for culturing Rat cartilage endplate and costal chondrocytes are poorly described and should be given more text.

      Response: Details about the cartilage endplate and costal chondrocyte isolation and culture have been added to the Methods.

      • Page 11: 1st paragraph, last sentence "These results suggest that Mmp3 expression"... this sentence needs attention. As written, I am not clear what the authors are trying to say.

      Response: This sentence has been clarified and now reads “These results suggest that Mmp3 expression is negatively regulated by Col11a1 in mouse costal chondrocytes.”

      • Page 13: line 4 from the bottom, "ECM-clearing"? This is confusing do you mean ECM degrading?

      Response: Yes and thank you. We have changed to “ECM-degrading”.

      • Please use version numbers for RefSeq IDs: e.g. NM_080629.3 instead of NM_080629 Response: This change has been made in the revised manuscript.

      • It would be helpful for readers if the ethnicity of the discovery case cohort was clearly stated as European ancestry in the Results main text.

      Response: “European ancestry” has been added at first description of the discovery cohort in the manuscript.

      • Avoid using the term "mutation" and use "variant" instead.

      Response: Thank you for pointing this out. “Variant” is now used throughout the manuscript.

      • Define error bars for all bar charts throughout and include individual data points overlaid onto bars.

      Response: Thank you. Error bars are now clarified in the Figure legends.

    2. Author Response

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly per the following comments:

      1. Pro1153Leu is extremely common in the general population (allele frequency in gnomAD is 0.5). Further discussion is warranted to justify the possibility that this variant contributes to a phenotype documented in 1.5-3% of the population. Is it possible that this variant is tagging other rare SNPs in the COL11A1 locus, and could any of the existing exome sequencing data be mined for rare nonsynonymous variants?

      One possible avenue for future work is to return to any existing exome sequencing data to query for rare variants at the COL11A1 locus. This should be possible for the USA MO case-control cohort. Any rare nonsynonymous variants identified should then be subjected to mutational burden testing, ideally after functional testing to diminish any noise introduced by rare benign variants in both cases and controls. If there is a significant association of rare variation in AIS cases, then they should consider returning to the other cohorts for targeted COL11A1 gene sequencing or whole exome sequencing (whichever approach is easier/less expensive) to demonstrate replication of the association.

      Response: Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Table below. Two of them (NM_080629.2: c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a GlyX-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We did conduct pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18), but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      1. COL11A1 p.Pro1335Leu is pursued as a direct candidate susceptibility locus, but the functional validation involves both: (a) a complementation assay in mouse GPCs, Figure 5; and (b) cultured rib cartilage cells from Col11a1-Ad5 Cre mice (Figure 4). Please address the following:

      2A. Is Pro1335Leu a loss of function, gain of function, or dominant negative variant? Further rationale for modeling this change in a Col11a1 loss of function cell line would be helpful.

      Response: Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

      2B. Expression appears to be augmented compared WT in Fig 5B, but there is no direct comparison of WT with variant.

      Response: Expression of the mutant (from the lentiviral expression vector) is increased compared to mutant. We observed this effect in repeated experiments. Sequencing confirmed that the mutant and wildtype constructs differed only at the position of the rs3753841 SNP. At this time, we cannot explain the difference in expression levels. Nonetheless, even when the variant COL11A1 is relatively overexpressed it fails to suppress MMP3 expression as observed for the wildtype form.

      2C. How do the authors know that their complementation data in Figure 5 are specific? Repetition of this experiment with an alternative common nonsynonymous variant in COL11A1 (such as rs1676486) would be helpful as a comparison with the expectation that it would be similar to WT.

      Response: We agree that testing an allelic series throughout COL11A1 could be informative, but we have shifted our resources toward in vivo experiments that we believe will ultimately be more informative for deciphering the mechanistic role of COL11A1 in MMP3 regulation and spine deformity.

      2D. The y-axes of histograms in panel A need attention and clarification. What is meant by power? Do you mean fold change?

      Response: Power is directly comparable to fold change but allows comparison of absolute expression levels between different genes.

      2E. Figure 5: how many technical and biological replicates? Confirm that these are stated throughout the figures.

      Response: Thank you for pointing out this oversight. This information has been added throughout.

      1. Figure 2: What does the gross anatomy of the IVD look like? Could the authors address this by showing an H&E of an adjacent section of the Fig. 2 A panels?

      Response: Panel 2 shows H&E staining. Perhaps the reviewer is referring to the WT and Pax1 KO images in Figure 3? We have now added H&E staining of WT and Pax1 KO IVD as supplemental Figure 3E to clarify the IVD anatomy.

      1. Page 9: "Cells within the IVD were negative for Pax1 staining ..." There seems to be specific PAX1 expression in many cells within the IVD, which is concerning if this is indeed a supposed null allele of Pax1. This data seems to support that the allele is not null.

      Response: We have now added updated images for the COL11A1 and PAX1 staining to include negative controls in which we omitted primary antibodies. As can be seen, there is faint autofluorescence in the PAX1 negative control that appears to explain the “specific staining” referred to by the reviewer. These images confirm that the allele is truly a null.

      1. There is currently a lack of evidence supporting the claim that "Col11a1 is positively regulated by Pax1 in mouse spine and tail". Therefore, it is necessary to conduct further research to determine the direct regulatory role of Pax1 on Col11a1.

      Response: We agree with the reviewer and have clarified that Pax1 may have either a direct or indirect role in Col11a1 regulation.

      1. There is no data linking loss of COL11A1 function and spine defects in the mouse model. Furthermore, due to the absence of P1335L point mutant mice, it cannot be confirmed whether P1335L can actually cause AIS, and the pathogenicity of this mutation cannot be directly verified. These limitations need to be clearly stated and discussed. A Col11a1 mouse mutant called chondroysplasia (cho), was shown to be perinatal lethal with severe endochondral defects (https://pubmed.ncbi.nlm.nih.gov/4100752/). This information may help contextualize this study.

      Response: We partially agree with the reviewer. Spine defects are reported in the cho mouse (for example, please see reference 36 Hafez et al). We appreciate the suggestion to cite the original Seegmiller et al 1971 reference and have added it to the manuscript.

      1. A recent article (PMID37462524) reported mutations in COL11A2 associated with AIS and functionally tested in zebrafish. That study should be cited and discussed as it is directly relevant for this manuscript.

      Response: We agree with the reviewer that this study provides important information supporting loss of function I type XI collagen in spinal deformity. Language to this effect has been added to the manuscript and this study is now cited in the paper.

      1. Please reconcile the following result on page 10 of the results: "Interestingly, the AISassociated gene Adgrg6 was amongst the most significantly dysregulated genes in the RNA-seq analysis (Figure 3c). By qRT-PCR analysis, expression of Col11a1, Adgrg6, and Sox6 were significantly reduced in female and male Pax1-/- mice compared to wild-type mice (Figure 3d-g)." In Figure 3f, the downregulation of Adgrg6 appears to be modest so how can it possibly be highlighted as one of the most significantly downregulated transcripts in the RNAseq data?

      Response: By “significant” we were referring to the P-value significance in RNAseq analysis, not in absolute change in expression. This language was clearly confusing, and we have removed it from the manuscript.

      1. It is incorrect to refer to the primary cell culture work as growth plate chondrocytes (GPCs), instead, these are primary costal chondrocyte cultures. These primary cultures have a mixture of chondrocytes at differing levels of differentiation, which may change differentiation status during the culturing on plastic. In sum, these cells are at best chondrocytes, and not specifically growth plate chondrocytes. This needs to be corrected in the abstract and throughout the manuscript. Moreover, on page 11 these cells are referred to as costal cartilage, which is confusing to the reader.

      Response: Thank you for pointing out these inconsistencies. We have changed the manuscript to say “costal chondrocytes” throughout.

      Minor points

      • On 10 of the Results: "These data support a mechanistic link between Pax1 and Col11a1, and the AIS-associated genes Gpr126 and Sox6, in affected tissue of the developing tail." qRT-PCR validation of Sox6, although significant, appears to be very modestly downregulated in KO. Please soften this statement in the text.

      Response: We have softened this statement.

      • Have you got any information about how the immortalized (SV40) costal cartilage affected chondrogenic differentiation? The expression of SV40 seemed to stimulate Mmp13 expression. Do these cells still make cartilage nodules? Some feedback on this process and how it affects the nature of the culture what be appreciated.

      Response: The “+ or –“ in Figure 5 refers to Ad5-cre. Each experiment was performed in SV40-immortalized costal chondrocytes. We have removed SV40 from the figure and have clarified the legend to say “qRT-PCR of human COL11A1 and endogenous mouse Mmp3 in SV40 immortalized mouse costal chondrocytes transduced with the lentiviral vector only (lanes 1,2), human WT COL11A1 (lane 3), or COL11A1P1335L. Otherwise we absolutely agree that understanding Mmp13 regulation during chondrocyte differentiation is important. We plan to study this using in vivo systems.

      • Figure 1: is the average Odds ratio, can this be stated in the figure legend?

      Response: We are not sure what is being asked here. The “combined odds ratio” is calculated as a weighted average of the log of the odds.

      • A more consistent use of established nomenclature for mouse versus human genes and proteins is needed.

      Human:GENE/PROTEIN Mouse: Gene/PROTEIN

      Response: Thank you for pointing this out. The nomenclature has been corrected throughtout the manuscript.

      • There is no Figure 5c, but a reference to results in the main text. Please reconcile. -There is no Figure 5-figure supplement 5a, but there is a reference to it in the main text. Please reconcile.

      Response: Figure references have been corrected.

      • Please indicate dilutions of all antibodies used when listed in the methods.

      Response: Antibody dilutions have been added where missing.

      • On page 25, there is a partial sentence missing information in the Histologic methods; "#S36964 Invitrogen, CA, USA)). All images were taken..."

      Response: We apologize for the error. It has been removed.

      • Table 1: please define all acronyms, including cohort names.

      Response: We apologize for the oversight. The legend to the Table has been updated with definitions of all acronyms.

      • Figure 2: Indicate that blue staining is DAPI in panel B. Clarify that "-ab" as an abbreviation is primary antibody negative.

      Response: A color code for DAPI and COL11A! staining has been added and “-ab” is now defined.

      • Page 4: ADGRG6 (also known as GPR126)...the authors set this up for ADGRG6 but then use GPR126 in the manuscript, which is confusing. For clarity, please use the gene name Adgrg6 consistently, rather than alternating with Gpr126.

      Response: Thank you for pointing this out. GPR126 has now been changed to ADGRG6 thoughout the manuscript.

      • REF 4: Richards, B.S., Sucato, D.J., Johnston C.E. Scoliosis, (Elsevier, 2020). Is this a book, can you provide more clarity in the Reference listing?

      Response: Thank you for pointing this out. This reference has been corrected.

      • While isolation was addressed, the methods for culturing Rat cartilage endplate and costal chondrocytes are poorly described and should be given more text.

      Response: Details about the cartilage endplate and costal chondrocyte isolation and culture have been added to the Methods.

      • Page 11: 1st paragraph, last sentence "These results suggest that Mmp3 expression"... this sentence needs attention. As written, I am not clear what the authors are trying to say.

      Response: This sentence has been clarified and now reads “These results suggest that Mmp3 expression is negatively regulated by Col11a1 in mouse costal chondrocytes.”

      • Page 13: line 4 from the bottom, "ECM-clearing"? This is confusing do you mean ECM degrading?

      Response: Yes and thank you. We have changed to “ECM-degrading”.

      • Please use version numbers for RefSeq IDs: e.g. NM_080629.3 instead of NM_080629

      Response: This change has been made in the revised manuscript.

      • It would be helpful for readers if the ethnicity of the discovery case cohort was clearly stated as European ancestry in the Results main text.

      Response: “European ancestry” has been added at first description of the discovery cohort in the manuscript.

      • Avoid using the term "mutation" and use "variant" instead.

      Response: Thank you for pointing this out. “Variant” is now used throughout the manuscript.

      • Define error bars for all bar charts throughout and include individual data points overlaid onto bars.

      Response: Thank you. Error bars are now clarified in the Figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Positive comments:

      We appreciate the positive comments of the editor and reviewers. The editor noted that the paper presents a “technological advance” that has enabled “important insights about the brain circuits through which the cerebellum could participate in social interactions.” Reviewer 1 thought this was a “timely and important study with solid evidence for correlative conclusions” and that the experiments were “technically challenging” and “well-performed”. Reviewer 2 stated that the finding of correlated activity between the regions is “interesting as non-motor functions of the cerebellum are relatively little explored.” They also thought “that the data are presented clearly, and the manuscript is well-written”. Reviewer 3 mentioned that “this approach can be useful for many neuroscientists”. We thank all the positive comments from the editors and all the reviewers.

      Reviewer #1 (Public Review)

      While the novelty of the device is strongly emphasized, I find that its value is somewhat diminished by the wire-free device developed by the same group as it should thus be possible to perform calcium imaging wire-free and electrophysiological recording via a single conventional cable (or also via wireless headstages).

      While it would be potentially possible to use a wire-free Miniscope in parallel with a wired electrophysiology recording system, this would result in a larger footprint on the animal’s head, more than a gram in increased weight due to an added LiPo battery, a larger electrophysiology head-stage, and limited recording length due to a battery capacity of around 20 minutes. Our main goal for the development of the E-scope platform was to develop an expandable electrophysiology recording board that would work with all previously built UCLA Miniscopes while also streamlining the integration of power and data into the coaxial cable connection already familiar to hundreds of labs using Miniscopes. The vast majority of Miniscope experiments are done using wired systems and we aimed to support the expansion of those systems instead of requiring a more substantial switch to using wire-free Miniscopes.

      The role of the identified network activations in social interactions is not touched upon.

      We agree with the reviewer that we have not discovered a causal role for the co-modulated activity patterns we have observed. As these causal experiments will require the development of real-time techniques for blocking socially evoked changes in firing rate in cerebellum and ACC, we are currently planning experiments to address causality. These results will be described in a future publication.

      Reviewer #1 (Recommendations for the Authors):

      Please provide the number of recorded mice.

      The number is now provided in the revised manuscript.

      If the recorded areas (cerebellar cortex, DN, and ACC) are part of the same circuit regulating social interactions, it would be nice to get insights into the directionality of the circuit. The authors favor the possibility that during social behavior, cerebellar efferences indirectly influence ACC activities (as in Figure 4A), however, no evidence is presented to support this interpretation. ACC activities might also indirectly influence PC firing. It may be possible to get insights into this by comparing the timing of neuronal activity in the different areas with respect to social onset.

      For this study, we mainly focused on the output of the cerebellar circuit to the cortex as previous work shows that dentate nucleus projects to the thalamus, which in turn projects to ACC and other cortical regions. (Badura et al.,eLife, 2018; Kelly et al., Nat. Neurosci., 2020) The temporal resolution of calcium imaging is limited (with the rise time of calcium events with genetically-encoded indicators taking hundreds of milliseconds) such that the resolution is insufficient to precisely assess the relative onset timing of the two regions. Our work certainly does not rule out cortical influences on PC firing.

      Reviewer #2 (Public Review)

      However, the causal relationship is far from established with the methods used, leaving it unclear if these two brain regions are similarly engaged by the behavior or if they form a pathway/loop.

      As indicated in our response to Reviewer #1’s similar critique, the goal of the presented study is to demonstrate the feasibility and capabilities of this novel device. This new tool will allow us to conduct a comprehensive and rigorous study to assess the causal role of the interactions between the cerebellum and ACC in social behavior (as well as other behaviors). These experiments are being designed now.

      Reviewer #2 (Recommendations for the Authors):

      It is unclear what is entirely unique about the E-scope. It seems that its advance is simply a common cable that allows interfacing with both devices (lighter weight than two cables is stated in the Discussion). Is this really an advance? What are its limitations? E.g., how close can the recording sites be to one another? How can it be configured for any other extracellular recording approach (tetrodes, 64-channel arrays, or Neuropixels)?

      In our experience, multiple lines of wires tethered to different head-mounted devices on an animal significantly impacts their behavior. Therefore, one of the major advantages of the UCLA Miniscope Platform is the use of a single, flexible coaxial cable to minimize the impact on tethering on behavior. The E-Scope platform builds on top of this work by incorporating electrophysiology recording capabilities into this single, flexible coaxial cable. Additionally, the electrophysiology recording hardware is backwards compatible with all previously built UCLA Miniscopes and can run through open-source and commercial commutators already used in Miniscope experiments.

      The available bandwidth within the shared single coaxial cable can handle megapixel Miniscope imaging along with the maximum data output of a 32 channel Intan Ephys IC. The E-Scope platform presented here does run the Intan Ephys IC at 20KSps for all 32 channels instead of the maximum 30KSps due to microcontroller speed limitations, but this could be overcome by using a fast microcontroller or clock, or slightly reducing the total number of electrodes samples. Finally, the E-Scope was designed to support any electrode types supported by the Intan Ephys IC. This includes up to 32 channels of passive probes such as single electrodes, tetrodes, silicon probes, and flexible multi-channel arrays but does not include Neuropixels as Neuropixels use custom active electronics on the probe to multiplex, sample, and serialize electrophysiology data.

      The authors only analyzed simple spikes in PCs for social-related activity. What about complex spikes? Is this correlated with ACC activity?

      Complex spikes were detectable to the extent that we were able to define that the recorded cell was a PC, but because these cells were recorded in freely behaving mice, accurate complex spike detection was not reliable enough to be used for further correlational analyses.

      The data is sampled in the two regions (cerebellum and ACC) at very different rates (imaging is much slower than electrophysiology; ephys data was binned). How does this affect the correlation plots?

      We generated firing rate maps for the cerebellar neural activity using a binning size that matched the sampling frequency of calcium imaging (see Methods). As mentioned in our methods, to study the relationship between the electrophysiology and calcium imaging data we binned the spike trains using 33 ms bins to match the calcium imaging sampling rate (~30 Hz). This limits the temporal resolution to calculate fine-scale correlations, but the correlations that we report are on a behaviorally relevant temporal scale. The fine temporal resolution of the electrophysiology data however can still be used to further examine at a higher temporal resolution the relationship between cerebellar output and specific social behavior epochs.

      For the correlation analysis, over what time frame was the activity relationship examined? How was this duration determined?

      Author response image 1.

      The main criteria for the time frame used to study the correlation analysis was the behavioral timescale of social interaction [see figure above for the number of social (red) and object (blue) interaction bouts (a), their duration (b) and coefficient of variation (CV) (c)]. Overall, the activity relationship time frame was based on the average duration of the social interactions (~3 sec). Periods of 3.8 before and 5.8 sec after interaction onset were used to study. Accordingly, the cross-correlograms were constructed using a maximum lag length of 5 sec. In the article we reported correlation at lag 0.

      The relationship between the cerebellum and ACC seems unconvincing. If two brain regions are similarly engaged by the behavior, wouldn't they have a high correlation? Is the activity in one region driving the other?

      We reference studies showing an anatomical and functional indirect connection between the cerebellum and the ACC or prefrontal cortex (Badura et al., eLife, 2018). Also, as stated in the introduction, the ACC is a recognized brain area for social behavioral studies. In the results, we stated that correlations increase in groups of neurons that are similarly engaged during a specific epoch in the social interaction was an expected finding. What was not expected was that there would be no difference in the distribution's correlation when the social epochs were removed, suggesting that intrinsic connectivity does not drive a difference in correlations.

      Although, since there is a cerebello-cortical loop, further study will be needed to understand which area initiates this type of activity during social behavior,

      • In the figures, the color-coded scale bars should be labeled as z-scores (confusing without them).

      • In Figure 4, the color differences for Soc-ACC, Soc+ACC and SocNS ACC should be more striking as it is hard to tell them apart because they are all similar shades of blue-gray.

      We thank the reviewer for their suggestions for improving the figures. We have incorporated these changes in Figures 2, 3 and along with their figure supplements. Graphs in Figure 4D-G have been edited to make the lines more visible to the reader.

      Reviewer #3 (Public Review)

      However, a mouse weighs between 20 and 40 g, so that an implant of 4.5 g is still quite considerable. It can be expected that this has an impact on the behavior and, possibly, the well-being of the animals. Whether this is the case or not, is not really addressed in this study.

      The weight of the E-Scope (4.5 g) is near the maximum that is tolerated by animals in our experience. We therefore acclimated the mouse to the weight with dummy scopes of increasing weights over a 7-10 day period. During this period, we observed the animal to have normal exploratory behavior. Specifically, there is no change in the sociability of the animals (Figure 2A) and animals cover the large arena (48x 48 cm, Figure 2H).

      Overall, the description of animal behavior is rather sparse. The methods state only that stranger age-matched mice were used, but do not state their gender. The nature of the social interactions was not described? Was their aggressive behavior, sexual approach and/or intercourse? Did the stranger mice attack/damage the E-Scope? Were the interactions comparable (using which parameters?) with and without E-Scope attached? It is not even described what the authors define as an "interaction bout" (Figure 2A). The number of interaction bouts is counted per 7 minutes, I presume? This is not specified explicitly.

      As mentioned in the methods section of the original version of our manuscript, all the target mice were age-matched “male” mice. As per the reviewer’s suggestion, we now have added in the manuscript that before any of our social interaction behavioral experiments, aggressive or agitated mice were removed after assessing their behavior in the arena during habituation. For all trials, all mice were introduced for the first time.

      We also mention in the methods section of our manuscript, that social behaviors were evaluated by proximity between the subject mouse and novel target mouse (2 cm from the body, head, or base of tail). From our recordings, we did not observe any aggressive, mounting, nor any other dominance behavior over the E-Scope subject mouse during the 7 minutes of social interaction assessment. Social interaction bouts in Figure 2A show the average number of social interaction bouts during the recording time. This has now been expanded upon in our revised manuscript.

      It would be very insightful if the authors would describe which events they considered to be action potentials, and which not. Similarly, the raw traces of Figure 1E are declared to be single-unit recordings of Purkinje cells. Partially due to the small size of the traces (invisible in print and pixelated in the digital version), I have a hard time recognizing complex spikes and simple spikes in these traces. This is a bit worrisome, as the authors declare the typical duration of the pause in simple spike firing after a complex spike to be 20-100 ms. In my experience, such long pauses are rare in this region, and definitely not typical. In the right panel of Figure 1A, an example of a complex spike-induced pause is shown. This pause is around 15 ms, so not typical according to the text, and starts only around 4 ms after the complex spike, which should not be the case and suggests either a misalignment of the figure or the detection of complex spike spikelets as simple spikes, while the abnormally long pause suggests that the authors fail to detect a lot of simple spikes. The authors could provide more confidence in their data by including more raw data, making explicit how they analyzed the signals, and by reporting basic statistics of firing properties (like rate, cv or cv2, pause duration). In this respect, Figure 2 - figure supplement 3 shows quite a large percentage of cells to have either a very low or a very high firing rate.

      We now provide a better example of simple spikes and complex spikes in Fig 1E and corrected our comment in the body of the manuscript. Previous version of the SS x CS cross-correlation histogram in Figure 1G as the reviewer mentions, was not the best example, because of the detected CS spikelets. However, the detection of CS spikelets has little impact on the interpretation of the results. We have replaced this figure with a better example of the SS x CS cross-correlation histogram.

      The number of Purkinje cells recorded during social interactions is quite low: only 11 cells showed a modulation in their spiking activity (unclear whether in complex spikes, simple spikes or both. During object interaction, only 4 cells showed a significant modulation. Unclear is whether the latter 4 are a subset of the former 11, or whether "social cells" and "object cells" are different categories. Having so few cells, and with these having different types of modulation, the group of cells for each type of modulation is really small, going down to 2 cells/group. It is doubtful whether meaningful interpretation is possible here.

      While the number of neurons is not as high as those reported for other regions, the number presented depicts the full range of responses to social behavior. It is extremely difficult to obtain stable neurons in freely behaving socially interacting animals and only a handful of neurons could be recorded in each animal. Among these recorded neurons only a subset responds to social interactions further reducing the numbers. The results however are consistent among cell types and the direction of modulation fits with the inhibitory connectivity between PCs and DN neurons. To our knowledge, we are the first group to publish neuronal activity of PC and DN neurons from freely behaving mice during social behavior.

      Neural activity patterns observed during social interaction do not necessarily relate specifically to social interaction, but can also occur in a non-social context. The authors control this by comparing social interactions with object interactions, but I miss a direct comparison between the two conditions, both in terms of behavior (now only the number of interactions is counted, not their duration or intensity), and in terms of neural activity. There is some analysis done on the interaction between movement and cerebellar activity (Figure 2 - figure supplement 4), but it is unclear to what extent social interactions and movements are separated here. It would already help to indicate in the plots with trajectories (e.g., Fig. 2H) indicate the social interactions (e.g., social interaction-related movements in red, the rest of the trajectories in black).

      We have updated the social interaction plots in Figure 2H in the revised version of the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      Increase the number of cerebellar neurons that are recorded.

      Due to the difficulty of the experiment and the low yield which we get for cerebellar recordings, substantially increasing the number of neurons will require many more experiments which are not feasible at this time.

      Include more raw data and make the analysis procedure more insightful with illustrations of intermediate steps.

      We have included a more thorough description of the analysis in the methods section of the revised manuscript.

      Provide a better description of the behavior.

      We have increased the level of detail regarding the mouse behavior in the Results and Methods sections. This includes a more detailed description of the parameters we used to analyze the social interaction.

    1. Author Response

      We are grateful to the reviewers for their positive feedback with their comments and suggestions on the manuscript. Reviewer 1 has indicated two weaknesses and Reviewer 2 has none. With this provisional reply, we address the two concerns of the Reviewer 1:

      1) Data obtained from a single aminoacyl-tRNA (D-Tyr-tRNATyr) have been generalized to imply that what is relevant to this model substrate is true for all other D-aa-tRNAs. This is not a risk-free extrapolation. Why do the authors believe that the length of the amino acid side chain will not matter in the activity of DTD2?

      We thank the reviewer for bringing up this important point. We wish to clarify that only a few of the aminoacyl-tRNA synthetases are known to charge D-amino acids and only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009). D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007). Moreover, we have previously shown that it recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      2) While the use of EFTu supports that the ternary complex formation by the elongation factor can resist modifications of L-Tyr-tRNATyr by the aldehydes or other agents, in the context of the present work on the role of DTD2 in plants, one would want to see the data using eEF1alpha. This is particularly relevant because there are likely to be differences in the way EFTu and eEF1alpha may protect aminoacyl-tRNAs (for example see description in the latter half of the article by Wolfson and Knight 2005, FEBS Letters 579, 3467-3472).

      We thank the reviewer for bringing another important point. We analysed the aa-tRNA bound elongation factor structures from both bacteria (PDB id: 1TTT) and mammal (PDB id: 5LZS) and found that the amino acid binding site is highly conserved where side chain of amino acid is projected outside. Modelling of D-amino acid in the same site shows serious clashes, indicating D-chiral rejection during aa-tRNA binding by elongation factor. In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs. Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences. However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We will present a sequence and structural conservation analysis to clarify this important point in our revised manuscript. Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      In our revised manuscript, we will provide a thorough point-by-point response to the above as well as all the specific reviewer comments. We also intend to include new analysis with updated data that would address the key questions raised by the reviewers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The biogenesis of outer membrane proteins (OMPs) into the outer membranes of Gram-negative bacteria is still not fully understood, particularly substrate recognition and insertion by beta-assembly machinery (BAM). In the studies, the authors present their studies that in addition to recognition by the last strand of an OMP, sometimes referred to as the beta-signal, an additional signal upstream of the last strand is also important for OMP biogenesis.

      Strengths:

      1. Overall the manuscript is well organized and written, and addresses an important question in the field. The idea that BAM recognizes multiple signals on OMPs has been presented previously, however, it was not fully tested.

      2. The authors here re-address this idea and propose that it is a more general mechanism used by BAM for OMP biogenesis.

      3. The notion that additional signals assist in biogenesis is an important concept that indeed needs fully tested in OMP biogenesis.

      4. A significant study was performed with extensive experiments reported in an attempt to address this important question in the field.

      5. The identification of important crosslinks and regions of substrates and Bam proteins that interact during biogenesis is an important contribution that gives clues to the path substrates take en route to the membrane.

      Weaknesses:

      Major critiques (in no particular order):

      1. The title indicates 'simultaneous recognition', however no experiments were presented that test the order of interactions during OMP biogenesis.

      We have replaced the word “Simultaneous” with “Dual” so as not to reflect on the timing of the recognition events for the distinct C-terminal signal and -5 signal.

      1. Aspects of the study focus on the peptides that appear to inhibit OmpC assembly, but should also include an analysis of the peptides that do not to determine this the motif(s) present still or not.

      We thank the reviewer for this comment. Our study focuses on the peptides which exhibited an inhibitory effect in order to elucidate further interactions between the BAM complex and substrate proteins, especially in early stage of the assembly process. In the case of peptide 9, which contains all of our proposed elements but did not have an inhibitory effect, there is the presence of an arginine residue at the polar residue next to hydrophobic residue in position 0 (0 Φ). As seen in Fig S5, S6, and S7, there are no positively charged amino acids in the polar residue positions in the -5 or last strands. This might be the reason why peptide 9, as well as peptide 24, the β-signal derived from the mitochondrial OMP Tom40 and contains a lysine at the polar position, did not display an inhibitory effect. Incorporating the reviewer's suggestions might elucidate conditions that should not be added to the elements, but this is not the focus of this paper and was not discussed to avoid complicating the paper.

      1. The β-signal is known to form a β-strand, therefore it is unclear why the authors did not choose to chop OmpC up according to its strands, rather than by a fixed peptide size. What was the rationale for how the peptide lengths were chosen since many of them partially overlap known strands, and only partially (2 residues) overlap each other? It may not be too surprising that most of the inhibitory peptides consist of full strands (#4, 10, 21, 23).

      A simple scan of known β-strands would have been an alternative approach, however this comes with the bias of limiting the experiments to predicted substrate (strand) sequences, and it presupposes that the secondary structure element would be formed by this tightly truncated peptide.

      Instead, we allowed for the possibility that OMPs meet the BAM complex in an unfolded or partially folded state, and that the secondary structure (β-strand) might only form via β-argumentation after the substrate is placed in the context of the lateral gate. We therefore used peptides that mapped right across the entirety of OmpC, with a two amino acid overlap.

      To clarify this important point regarding the unbiased nature of our screen, we have revised the text:

      (Lines 147-151) "We used peptides that mapped the entirety of OmpC, with a two amino acid overlap. This we considered preferable to peptides that were restricted by structural features, such as β-strands, in consideration that β-strand formation may or may not have occurred in early-stage interactions at the BAM complex."

      1. It would be good to have an idea of the propensity of the chosen peptides to form β-stands and participate in β-augmentation. We know from previous studies with darobactin and other peptides that they can inhibit OMP assembly by competing with substrates.

      We appreciate the reviewer's suggestion. However, we have not conducted biophysical characterizations of the peptides to calculate the propensity of each peptide to form β-stands and participate in β-augmentation. The sort of detailed biophysical analysis done for Darobactin (by the Maier and Hiller groups, The antibiotic darobactin mimics a β-strand to inhibit outer membrane insertase Nature 593:125-129) was a Nature publication based on this single peptide. A further biophysical analysis of all of the peptides presented here goes well beyond the scope of our study.

      1. The recognition motifs that the authors present span up to 9 residues which would suggest a relatively large binding surface, however, the structures of these regions are not large enough to accommodate these large peptides.

      The β-signal motif (ζxGxx[Ω/Φ]x[Ω/Φ]) is an 8-residue consensus, some of the inhibitory peptides include additional residues before and after the defined motif of 8 residues, and the lateral gate of BamA has been shown interact with a 7-residue span (eg. Doyle et al, 2022). Cross-linking presented in our study showed BamD residues R49 and G65 cross-linked to the positions 0 and 6 of the internal signal in OmpC (Fig. 6D).

      We appreciate this point of clarification and have modified the text to acknowledge that in the final registering of the peptide with its binding protein, some parts of the peptide might sit beyond the bounds of the BamD receptor’s binding pocket and the BamA lateral gate:

      (Lines 458-471) "The β-signal motif (ζxGxx[Ω/Φ]x[Ω/Φ]) is an eight-residue consensus, and internal signal motif is composed of a nine-residue consensus. Recent structures have shown the lateral gate of BamA interacts with a 7-residue span of substrate OMPs. Interestingly, inhibitory compounds, such as darobactin, mimic only three resides of the C-terminal side of β-signal motif. Cross-linking presented here in our study showed that BamD residues R49 and G65 cross-linked to the positions 0 and 6 of the internal signal in OmpC (Fig. 6D). Both signals are larger than the assembly machineries signal binding pocket, implying that the signal might sit beyond the bounds of the signal binding pocket in BamD and the lateral gate in BamA. These finding are consistent with similar observations in other signal sequence recognition events, such as the mitochondrial targeting presequence signal that is longer than the receptor groove formed by the Tom20, the subunit of the translocator of outer membrane (TOM) complex (Yamamoto et al., 2011). The presequence has been shown to bind to Tom20 in several different conformations within the receptor groove (Nyirenda et al., 2013)."

      Moreover, the distance between amino acids of BamD which cross-linked to the internal signal, R49 and Y62, is approximately 25 Å (pdbID used 7TT3). The distance of the maximum amino acid length of the internal signal of OmpC, from F280 to Y288, is approximately 22 Å (pdbID used 2J1N). This would allow for the signal to fit within the confines of the TRP motif of BamD.

      Author response image 1.

      1. The authors highlight that the sequence motifs are common among the inhibiting peptides, but do not test if this is a necessary motif to mediate the interactions. It would have been good to see if a library of non-OMP related peptides that match this motif could also inhibit or not.

      With respect, this additional work would not address any biological question relevant to the function of BamD. To randomize sequences and then classify those that do or don’t fit the motif would help in refining the parameters of the β-signal motif, but that was not our intent.

      We have identified the peptides from within the total sequence of an OMP, shown which peptides inhibit in an assembly assay, and then observed that the inhibitory peptides conform to a previously published (β-signal) motif.

      1. In the studies that disrupt the motifs by mutagenesis, an effect was observed and attributed to disruption of the interaction of the 'internal signal'. However, the literature is filled with point mutations in OMPs that disrupt biogenesis, particular those within the membrane region. F280, Y286, V359, and Y365 are all residues that are in the membrane region that point into the membrane. Therefore, more work is needed to confirm that these mutations are in parts of a recognition motif rather than on the residues that are disrupting stability/assembly into the membrane.

      As the reviewer pointed out, the side chains of the amino acids constituting the signal elements we determined were all facing the lipid side, of which Y286 and Y365 were important for folding as well as to be recognized. However, F280A and V359A had no effect on folding, but only on assembly through the BAM complex. The fact that position 0 functions as a signal has been demonstrated by peptidomimetics (Fig. 1) and point mutant analysis (Fig. 2). We appreciate this clarification and have modified the text to acknowledge that the all of the signal element faces the lipid side, which contributes to their stability in the membrane finally, and before that the BAM complex actively recognizes them and determines their orientation:

      (Lines 519-526) After OMP assembly, all elements of the internal signal are positioned such that they face into the lipid-phase of the membrane. This observation may be a coincidence, or may be utilized by the BAM complex to register and orientate the lipid facing amino acids in the assembling OMP away from the formative lumen of the OMP. Amino acids at position 6, such as Y286 in OmpC, are not only component of the internal signal for binding by the BAM complex, but also act in structural capacity to register the aromatic girdle for optimal stability of the OMP in the membrane.

      1. The title of Figure 3 indicates that disrupting the internal signal motif disrupts OMP assembly, however, the point mutations did not seem to have any effect. Only when both 280 and 286 were mutated was an effect observed. And even then, the trimer appeared to form just fine, albeit at reduced levels, indicating assembly is just fine, rather the rate of biogenesis is being affected.

      We appreciate this point and have revised the title of Figure 3 to be:

      (Lines 1070-1071) "Modifications in the putative internal signal slow the rate of OMP assembly in vivo."

      1. In Figure 4, the authors attempt to quantify their blots. However, this seems to be a difficult task given the lack of quality of the blots and the spread of the intended signals, particularly of the 'int' bands. However, the more disturbing trend is the obvious reduction in signal from the post-urea treatment, even for the WT samples. The authors are using urea washes to indicate removal of only stalled substrates. However a reduction of signal is also observed for the WT. The authors should quantify this blot as well, but it is clear visually that both WT and the mutant have obvious reductions in the observable signals. Further, this data seems to conflict with Fig 3D where no noticeable difference in OmpC assembly was observed between WT and Y286A, why is this the case?

      We have addressed this point by adding a statistical analysis on Fig. 4A. As the reviewer points out, BN-PAGE band quantification is a difficult task given the broad spread of the bands on these gels. Statistical analysis showed that the increase in intermediates (int) was statistically significant for Y286A at all times until 80 min, when the intermediate form signals decrease.

      (Lines 1093-1096) "Statistical significance was indicated by the following: N.S. (not significant), p<0.05; , p<0.005; *. Exact p values of intermediate formed by Wt vs Y286A at each timepoint were as follows; 20 minutes: p = 0.03077, 40 minutes: p = 0.02402, 60 minutes: p = 0.00181, 80 minutes: p = 0.0545."

      Further regarding the Int. band, we correct the statement as follows.

      (Lines 253-254) "Consistent with this, the assembly intermediate which was prominently observed at the OmpC(Y286A) can be extracted from the membranes with urea;"

      OMP assembly in vivo has additional periplasmic chaperones and factors present in order to support the assembly process. Therefore, it is likely that some proteins were assembled properly in vivo compared to their in vitro counterparts. Such a decrease has been observed not only in E. coli but also in mitochondrial OMP import (Yamano et al., 2010).

      1. The pull-down assays with BamA and BamD should include a no protein control at the least to confirm there is no non-specific binding to the resin. Also, no detergent was mentioned as part of the pull downs that contained BamA or OmpC, nor was it detailed if OmpC was urea solubilized.

      We have performed pull down experiments with a no-protein (Ni-NTA only) control as noted (Author response image 1). The results showed that the amount of OmpC carrying through on beads only was significantly lower than the amount of OmpC bound in the presence of BamD or BamA. The added OmpC was not treated with urea, but was synthesized by in vitro translation; the in vitro translated OmpC is the standard substrate in the EMM assembly assay (Supp Fig. S1) where it is recognized by the BAM complex. Thus, we used it for pull-down as well and, to make this clearer, we have revised as follows:

      Author response image 2.

      Pull down assay of radio-labelled OmpC with indicated protein or Ni-NTA alone (Ni-NTA) . T; total, FT; Flow throw, W; wash, E; Elute.

      (Lines 252-265) "Three subunits of the BAM complex have been previously shown to interact with the substrates: BamA, BamB, and BamD (Hagan et al., 2013; Harrison, 1996; Ieva et al., 2011). In vitro pull-down assay showed that while BamA and BamD can independently bind to the in vitro translated OmpC polypeptide (Fig .S9A), BamB did not (Fig. S9B)."

      11.

      • The neutron reflectometry experiments are not convincing primarily due to the lack controls to confirm a consistent uniform bilayer is being formed and even if so, uniform orientations of the BamA molecules across the surface.

      • Further, no controls were performed with BamD alone, or with OmpC alone, and it is hard to understand how the method can discriminate between an actual BamA/BamD complex versus BamA and BamD individually being located at the membrane surface without forming an actual complex.

      • Previous studies have reported difficulty in preparing a complex with BamA and BamD from purified components.

      • Additionally, little signal differences were observed for the addition of OmpC. However, an elongated unfolded polypeptide that is nearly 400 residues long would be expected to produce a large distinct signal given that only the C-terminal portion is supposedly anchored to BAM, while the rest would be extended out above the surface.

      • The depiction in Figure 5D is quite misleading when viewing the full structures on the same scales with one another.

      We have addressed these five points individually as follows.

      i. The uniform orientation of BamA on the surface is guaranteed by the fixation through a His-tag engineered into extracellular loop 6 of BamA and has been validated in previous studies as cited in the text. Moreover, to explain this, we reconstructed another theoretical model for BamA not oriented well in the system as below. However, we found that the solid lines (after fitting) didn’t align well with the experimental data. We therefore assumed that BamA has oriented well in the membrane bilayer.

      Author response image 3.

      Experimental (symbols) and fitted (curves) NR profiles of BamA not oriented well in the POPC bilayer in D2O (black), GMW (blue) and H2O (red) buffer.

      ii. There would be no means by which to do a control with OmpC alone or BamD alone as neither protein binds to the lipid layer chip. OmpC is diluted from urea and then the unbound OmpC is washed from the chip before NR measurements. BamD does not have an acyl group to anchor it to the lipid layer, without BamA to anchor to, it too is washed from the chip before NR measurements. We have reconstructed another theoretical model for both of BamA + BamD embedding in the membrane bilayer, and the fits were shown below. Apparently, the fits didn’t align well with the experimental data, which discriminate the BamA/BamD individually being located at the membrane surface without forming an actual complex.

      Author response image 4.

      Experimental (symbols) and fitted (curves) NR profiles of BamA+D embedding together in the POPC bilayer in D2O (black), GMW (blue) and H2O (red) buffer.

      iii. The previous studies that reported difficulty in preparing a complex with BamA and BamD from purified components were assays done in aqueous solution including detergent solubilized BamA, or with BamA POTRA domains only. Our assay is superior in that it reports the binding of BamD to a purified BamA that has been reconstituted in a lipid bilayer.

      iv. The relatively small signal differences observed for the addition of OmpC are expected, since OmpC is an elongated, unfolded polypeptide of nearly 400 residues long which, in the context of this assay, can occupy a huge variation in the positions at which it will sit with only the C-terminal portion anchored to BAM, and the rest moving randomly about and extended from the surface.

      v. We appreciate the point raised and have now added a note in the Figure legend that these are depictions of the results and not a scale drawing of the structures.

      1. In the crosslinking studies, the authors show 17 crosslinking sites (43% of all tested) on BamD crosslinked with OmpC. Given that the authors are presenting specific interactions between the two proteins, this is worrisome as the crosslinks were found across the entire surface of BamD. How do the authors explain this? Are all these specific or non-specific?

      The crosslinking experiment using purified BamD was an effective assay for comprehensive analysis of the interaction sites between BamD and the substrate. However, as the reviewer pointed out, cross-linking was observed even at the sites that, in the context of the BAM complex, interact with BamC as a protein-protein interaction and would not be available for substrate protein-protein interactions. To complement this, analysis and to address this issue, we also performed the experiment in Fig. 6C.

      In Fig. 6C, the interaction of BamD with the substrate is examined in vivo, and the results demonstrate that if BPA is introduced into the site, we designated as the substrate recognition site, it is cross-linked to the substrate. On the other hand, position 114 was found to crosslink with the substrate in vitro crosslinking, but not in vivo. It should be noted that position 114 has also been confirmed to form cross-link products with BamC, we believe that BamD-substrate interactions in the native state have been investigated. To explain the above, we have added the following description to the Results section.

      (Lines 319-321) "Structurally, these amino acids locate both the lumen side of funnel-like structure (e.g. 49 or 62) and outside of funnel-like structure such as BamC binding site (e.g. 114) (fig. S12C). (Lines 350-357) Positions 49, 53, 65, and 196 of BamD face the interior of the funnel-like structure of the periplasmic domain of the BAM complex, while position 114 is located outside of the funnel-like structure (Bakelar et al., 2016; Gu et al., 2016; Iadanza et al., 2016). We note that while position 114 was cross-linked with OmpC in vitro using purified BamD, that this was not seen with in vivo cross-linking. Instead, in the context of the BAM complex, position 114 of BamD binds to the BamC subunit and would not be available for substrate binding in vivo (Bakelar et al., 2016; Gu et al., 2016; Iadanza et al., 2016)."

      1. The study in Figure 6 focuses on defined regions within the OmpC sequence, but a more broad range is necessary to demonstrate specificity to these regions vs binding to other regions of the sequence as well. If the authors wish to demonstrate a specific interaction to this motif, they need to show no binding to other regions.

      The region of affinity for the BAM complex was determined by peptidomimetic analysis, and the signal region was further identified by mutational analysis of OmpC. Subsequently, the subunit that recognizes the signal region was identified as BamD. In other words, in the process leading up to Fig. 6, we were able to analyze in detail that other regions were not the target of the study. We have revised the text to make clear that we focus on the signal region including the internal signal, and have not also analyzed other parts of the signal region:

      (Lines 329-332) "As our peptidomimetic screen identified conserved features in the internal signal, and cross-linking highlighted the N-terminal and C-terminal TPR motifs of BamD as regions of interaction with OmpC, we focused on amino acids specifically within the β-signals of OmpC and regions of BamD which interact with β-signal."

      1. The levels of the crosslinks are barely detectable via western blot analysis. If the interactions between the two surfaces are required, why are the levels for most of the blots so low?

      These are western blots of cross-linked products – the efficiency of cross-linking is far less than 100% of the interacting protein species present in a binding assay and this explains why the levels for the blots are ‘so low’. We have added a sentence to the revised manuscript to make this clear for readers who are not molecular biologists:

      (Lines 345-348) "These western blots reveal cross-linked products representing the interacting protein species. Photo cross-linking of unnatural amino acid is not a 100% efficient process, so the level of cross-linked products is only a small proportion of the molecules interacting in the assays."

      15.

      • Figure 7 indicates that two regions of BamD promote OMP orientation and assembly, however, none of the experiments appears to measure OMP orientation?

      • Also, one common observation from panel F was that not only was the trimer reduced, but also the monomer. But even then, still a percentage of the trimer is formed, not a complete loss.

      (i) We appreciate this point and have revised the title of Figure 7 to be:

      (Lines 1137-1138) "Key residues in two structurally distinct regions of BamD promote β-strand formation and OMP assembly."

      (ii) In our description of Fig. 7F (Lines 356-360) we do not distinguish between the amount of monomer and trimer forms, since both are reflective of the overall assembly rate i.e. assembly efficiency. Rather, we state that:

      "The EMM assembly assay showed that the internal signal binding site was as important as the β-signal binding site to the overall assembly rates observed for OmpC (Fig. 7F), OmpF (fig. S15D), and LamB (fig. S15E). These results suggest that recognition of both the C-terminal β-signal and the internal signal by BamD is important for efficient protein assembly."

      16.

      • The experiment in Fig 7B would be more conclusive if it was repeated with both the Y62A and R197A mutants and a double mutant. These controls would also help resolve any effect from crowding that may also promote the crosslinks.

      • Further, the mutation of R197 is an odd choice given that this residue has been studied previously and was found to mediate a salt bridge with BamA. How was this resolved by the authors in choosing this site since it was not one of the original crosslinking sites?

      As stated in the text, the purpose of the experiment in Figure 7B is to measure the impact of pre-forming a β-strand in the substrate (OmpC) before providing it to the receptor (BamD). We thank the reviewer for the comment on the R197 position of BamD. The C-terminal domain of BamD has been suggested to mediate the BamA-BamD interface, specifically BamD R197 amino acid creates a salt-bridge with BamA E373 (Ricci et al., 2012). It had been postulated that the formation of this salt-bridge is not strictly structural, with R197 highlighted as a key amino acid in BamD activity and this salt-bridge acts as a “check-point” in BAM complex activity (Ricci et al., 2012, Storek et al., 2023). Our results agree with this, showing that the C-terminus of BamD acts in substrate recognition and alignment of the β-signal (Fig. 6, Fig S12). We show that amino acids in the vicinity of R197 (N196, G200, D204) cross-linked well to substrate and mutations to the β-signal prevent this interaction (Fig S12B, D). For mutational analysis of BamD, we looked then at the conservation of the C-terminus of BamD and determined R197 was the most highly conserved amino acid (Fig 6C). In order to account for this, we have adjusted the manuscript:

      (Lines 376-377) "R197 has previously been isolated as a suppressor mutation of a BamA temperature sensitive strain (Ricci et al., 2012)."

      (Lines 495-496) "This adds an additional role of the C-terminus of BamD beyond a complex stability role (Ricci et al., 2012; Storek et al., 2023)."

      1. As demonstrated by the authors in Fig 8, the mutations in BamD lead to reduction in OMP levels for more than just OmpC and issues with the membrane are clearly observable with Y62A, although not with R197A in the presence of VCN. The authors should also test with rifampicin which is smaller and would monitor even more subtle issues with the membrane. Oddly, no growth was observed for the Vec control in the lower concentration of VCN, but was near WT levels for 3 times VCN, how is this explained?

      While it would be interesting to correlate the extent of differences to the molecular size of different antibiotics such as rifampicin, such correlations are not the intended aim of our study. Vancomycin (VCN) is a standard measure of outer membrane integrity in our field, hence its use in our tests for membrane integrity.

      We apologize to the reviewer as Figure 8 D-G may have been misleading. Figure 8D,E are using bamD shut-down cells expressing plasmid-borne BamD mutants. Whereas Figure 8F, G are the same strain as used in Figure 3. We have adjusted the figure as well as the figure legend: (Lines 1165-1169) D, E E coli bamD depletion cells expressing mutations at residues, Y62A and R197A, in the β-signal recognition regions of BamD were grown with of VCN. F, G, E coli cells expressing mutations to OmpC internal signal, as shown in Fig 3, grown in the presence of VCN. Mutations to two key residues of the internal signal were sensitive to the presence of VCN.

      1. While Fig 8I indeed shows diminished levels for FY as stated, little difference was observed for the trimer for the other mutants compared to WT, although differences were observed for the dimer. Interestingly, the VY mutant has nearly WT levels of dimer. What do the authors postulate is going on here with the dimer to trimer transition? How do the levels of monomer compare, which is not shown?

      The BN-PAGE gel system cannot resolve protein species that migrate below ~50kDa and the monomer species of the OMPs is below this size. We can’t comment on effects on the monomer because it is not visualized. The non-cropped gel image is shown here. Recently, Hussain et al., has shown that in vitro proteo-liposome system OmpC assembly progresses from a “short-lived dimeric” form before the final process of trimerization (Hussain et al., 2021). However, their findings suggest that LPS plays the final role in stimulation of dimer-to-trimer, a step well past the recognition step of the β-signals. Mutations to the internal signal of OmpC results in the formation of an intermediate, the substrate stalled on the BAM complex. This stalling, presumably, causes a hinderance to the BAM complex resulting in reduced timer and loss of dimer OmpF signal in the EMM of cells expressing OmpC double mutant strain, FY. cannot resolve protein species that migrate below ~50kDa and the monomer species of the OMPs is below this size. We can’t comment on effects on the monomer because it is not visualized. The non-cropped gel image is shown here. We have noted this in the revised text:

      Author response image 5.

      Non-cropped gel of Fig. 8I. the asterisk indicates a band observed in the sample loading wells at the top of the gel.

      (Lines 417-418) "The dimeric form of endogenous OmpF was prominently observed in both the OmpC(WT) as well as the OmpC(VY) double mutant cells."

      1. In the discussion, the authors indicate they have '...defined an internal signal for OMP assembly', however, their study is limited and only investigates a specific region of OmpC. More is needed to definitively say this for even OmpC, and even more so to indicate this is a general feature for all OMPs.

      We acknowledge the reviewer's comment on this point and have expanded the statement to make sure that the conclusion is justified with the specific evidence that is shown in the paper and the supplementary data. We now state:

      (Lines 444-447) "This internal signal corresponds to the -5 strand in OmpC and is recognized by BamD. Sequence analysis shows that similar sequence signatures are present in other OMPs (Figs. S5, S6 and S7). These sequences were investigated in two further OMPs: OmpF and LamB (Fig. 2C and D)."

      Note, we did not state that this is a general feature for all OMPs. That would not be a reasonable proposition.

      20.

      • In the proposed model in Fig 9, it is hard to conceive how 5 strands will form along BamD given the limited surface area and tight space beneath BAM.

      • More concerning is that the two proposal interaction sites on BamD, Y62 and R197, are on opposite sides of the BamD structure, not along the same interface, which makes this model even more unlikely.

      • As evidence against this model, in Figure 9E, the two indicates sites of BamD are not even in close proximity of the modeled substrate strands.

      We can address the reviewer’s three concerns here:

      i. The first point is that the region (formed by BamD engaged with POTRA domains 1-2 and 5 of BamA) is not sufficient to accommodate five β-strands. Structural analysis reveals that the interaction between the N-terminal side of BamD and POTRA1-2 is substantially changed the conformation by substrate binding, and that this surface is greatly extended. This surface does have enough space to accommodate five beta-strands, as now documented in Fig. 9D, 9E using the latest structures (7TT5 and 7TT2) as illustrations of this. The text now reads:

      (Lines 506-515) "Spatially, this indicates the BamD can serve to organize two distinct parts of the nascent OMP substrate at the periplasmic face of the BAM complex, either prior to or in concert with, engagement to the lateral gate of BamA. Assessing this structurally showed the N-terminal region of BamD (interacting with the POTRA1-2 region of BamA) and the C-terminal region of BamD (interacting with POTRA5 proximal to the lateral gate of BamA) (Bakelar et al., 2016; Gu et al., 2016; Tomasek et al., 2020) has the N-terminal region of BamD changing conformation depending on the folding states of the last four β-strands of the substrate OMP, EspP (Doyle et al., 2022). The overall effect of this being a change in the dimensions of this cavity change, a change which is dependent on the folded state of the substrate engaged in it (Fig 9 B-E)."

      ii. The second point raised regards the orientation of the substrate recognition residues of BamD. Both Y62A and R197 were located on the lumen side of the funnel in the EspP-BAM transport intermediate structure (PDBID;7TTC); Y62A is relatively located on the edge of BamD, but given that POTRA1-2 undergoes a conformational change and opens this region, as described above, both are located in locations where they could bind to substrates. This was explained in the following text in the results section of revised manuscript.

      (Lines 377-379) "Each residue was located on the lumen side of the funnel-like structure in the EspP-BAM assembly intermediate structure (PDBID; 7TTC) (Doyle et al., 2022)."

      **Reviewer #2 (Public Review):"

      Previously, using bioinformatics study, authors have identified potential sequence motifs that are common to a large subset of beta-barrel outer membrane proteins in gram negative bacteria. Interestingly, in that study, some of those motifs are located in the internal strands of barrels (not near the termini), in addition to the well-known "beta-signal" motif in the C-terminal region.

      Here, the authors carried out rigorous biochemical, biophysical, and genetic studies to prove that the newly identified internal motifs are critical to the assembly of outer membrane proteins and the interaction with the BAM complex. The author's approaches are rigorous and comprehensive, whose results reasonably well support the conclusions. While overall enthusiastic, I have some scientific concerns with the rationale of the neutron refractory study, and the distinction between "the intrinsic impairment of the barrel" vs "the impairment of interaction with BAM" that the internal signal may play a role in. I hope that the authors will be able to address this.

      Strengths:

      1. It is impressive that the authors took multi-faceted approaches using the assays on reconstituted, cell-based, and population-level (growth) systems.

      2. Assessing the role of the internal motifs in the assembly of model OMPs in the absence and presence of BAM machinery was a nice approach for a precise definition of the role.

      Weaknesses:

      1. The result section employing the neutron refractory (NR) needs to be clarified and strengthened in the main text (from line 226). In the current form, the NR result seems not so convincing.

      What is the rationale of the approach using NR?

      We have now modified the text to make clear that:

      (Lines 276-280) "The rationale to these experiments is that NR provides: (i) information on the distance of specified subunits of a protein complex away from the atomically flat gold surface to which the complex is attached, and (ii) allows the addition of samples between measurements, so that multi-step changes can be made to, for example, detect changes in domain conformation in response to the addition of a substrate."

      What is the molecular event (readout) that the method detects?

      We have now modified the text to make clear that:

      (Lines 270-274) "While the biochemical assay demonstrated that the OmpC(Y286A) mutant forms a stalled intermediate with the BAM complex, in a state in which membrane insertion was not completed, biochemical assays such as this cannot elucidate where on BamA-BamD this OmpC(Y286A) substrate is stalled."

      What are "R"-y axis and "Q"-x axis and their physical meanings (Fig. 5b)?

      The neutron reflectivity, R, refers to the ratio of the incoming and exiting neutron beams and it is measured as a function of Momentum transfer Q, which is defined as Q=4π sinθ/λ, where θ is the angle of incident and λ is the neutron wavelength. R(Q)is approximately given byR(Q)=16π2/ Q2 |ρ(Q)|2, where R(Q) is the one-dimensional Fourier transform of ρ(z), the scattering length density (SLD) distribution normal to the surface. SLD is the sum of the coherent neutron scattering lengths of all atoms in the sample layer divided by the volume of the layer. Therefore, the intensity of the reflected beams is highly dependent on the thickness, densities and interface roughness of the samples. This was explained in the following text in the method section of revised manuscript.

      (Lines 669-678) "Neutron reflectivity, denoted as R, is the ratio of the incoming to the exiting neutron beams. It’s calculated based on the Momentum transfer Q, which is defined by the formula Q=4π sinθ/λ, where θ represents the angle of incidence and λ stands for the neutron wavelength. The approximate value of R(Q) can be expressed as R(Q)=16π2/ Q2 |ρ(Q)|2, where R(Q) is the one-dimensional Fourier transform of ρ(z), which is the scattering length density (SLD) distribution perpendicular to the surface. SLD is calculated by dividing the sum of the coherent neutron scattering lengths of all atoms in a sample layer by the volume of that layer. Consequently, factors such as thickness, volume fraction, and interface roughness of the samples significantly influence the intensity of the reflected beams."

      How are the "layers" defined from the plot (Fig. 5b)?

      The “layers” in the plot (Fig. 5b) represent different regions of the sample being studied. In this study, we used a seven-layer model to fit the experimental data (chromium - gold - NTA - HIS8 - β-barrel - P3-5 - P1-2. This was explained in the following text in the figure legend of revised manuscript. (Lines 1115-1116) The experimental data was fitted using a seven-layer model: chromium - gold - NTA - His8 - β-barrel - P3-5 - P1-2.

      What are the meanings of "thickness" and "roughness" (Fig. 5c)?

      We used neutron reflectometry to determine the relative positions of BAM subunits in a membrane environment. The binding of certain subunits induced conformational changes in other parts of the complex. When a substrate membrane protein is added, the periplasmic POTRA domain of BamA extends further away from the membrane surface. This could result in an increase in thickness as observed in neutron reflectometry measurements.

      As for roughness, it is related to the interface properties of the sample. In neutron reflectometry, the intensity of the reflected beams is highly dependent on the thickness, densities, and interface roughness of the samples. An increase in roughness could suggest changes in these properties, possibly due to protein-membrane interactions or structural changes within the membrane.

      (Lines 1116-1120) "Table summarizes of the thickness, roughness and volume fraction data of each layer from the NR analysis. The thickness refers to the depth of layered structures being studied as measured in Å. The roughness refers to the irregularities in the surface of the layered structures being studied as measured in Å."

      What does "SLD" stand for?

      We apologize for not explaining abbreviation when the SLD first came out. We explained it in revised manuscript. (Line 298)

      1. In the result section, "The internal signal is necessary for insertion step of assembly into OM" This section presents an important result that the internal beta-signal is critical to the intrinsic propensity of barrel formation, distinct from the recognition by BAM complex. However, this point is not elaborated in this section. For example, what is the role of these critical residues in the barrel structure formation? That is, are they involved in any special tertiary contacts in the structure or in membrane anchoring of the nascent polypeptide chains?

      We appreciate the reviewer's comment on this point. Both position 0 and position 6 appear to be important amino acids for recognition by the BAM complex, since mutations introduced at these positions in peptide 18 prevent competitive inhibition activity.

      In terms of the tertiary structure of OmpC, position 6 is an amino acid that contributes to the aromatic girdle, and since Y286A and Y365A affected OMP folding as measured in folding experiments, it is perhaps their position in the aromatic girdle that contributes to the efficiency of β-barrel folding in addition to its function as a recognition signal. We have added a sentence in the revised manuscript:

      (Lines 233-236) "Position 6 is an amino acid that contributes to the aromatic girdle. Since Y286A and Y365A affected OMP folding as measured in folding experiments, their positioning into the aromatic girdle may contributes to the efficiency of β-barrel folding, in addition to contributing to the internal signal."

      The mutations made at position 0 had no effect on folding, so this residue may function solely in the signal. Given the register of each β-strand in the final barrel, the position 0 residues have side-chains that face out into the lipid environment. From examination of the OmpC crystal structure, the residue at position 0 makes no special tertiary contacts with other, neighbouring residues.  

      Reviewer #1 (Recommendations For The Authors):

      Minor critiques (in no particular order):

      1. Peptide 18 was identified based on its strong inhibition for EspP assembly but another peptide, peptide 23, also shows inhibition and has no particular consensus.

      We would correct this point. Peptide 23 has a strong consensus to the canonical β-signal. We had explained the sequence consensus of β-signal in the Results section of the text. In the third paragraph, we have added a sentence indicating the relationship between peptide 18 and peptide 23.

      (Lines 152-168) "Six peptides (4, 10, 17, 18, 21, and 23) were found to inhibit EspP assembly (Fig. 1A). Of these, peptide 23 corresponds to the canonical β-signal of OMPs: it is the final β-strand of OmpC and it contains the consensus motif of the β-signal (ζxGxx[Ω/Φ]x[Ω/Φ]). The inhibition seen with peptide 23 indicated that our peptidomimetics screening system using EspP can detect signals recognized by the BAM complex. In addition to inhibiting EspP assembly, five of the most potent peptides (4, 17, 18, 21, and 23) inhibited additional model OMPs; the porins OmpC and OmpF, the peptidoglycan-binding OmpA, and the maltoporin LamB (fig. S3). Comparing the sequences of these inhibitory peptides suggested the presence of a sub-motif from within the β-signal, namely [Ω/Φ]x[Ω/Φ] (Fig. 1B). The sequence codes refer to conserved residues such that: ζ, is any polar residue; G is a glycine residue; Ω is any aromatic residue; Φ is any hydrophobic residue and x is any residue (Hagan et al., 2015; Kutik et al., 2008). The non-inhibitory peptide 9 contained some elements of the β-signal but did not show inhibition of EspP assembly (Fig. 1A).

      Peptide 18 also showed a strong sequence similarity to the consensus motif of the β-signal (Fig. 1B) and, like peptide 23, had a strong inhibitory action on EspP assembly (Fig. 1A). Variant peptides based on the peptide 18 sequence were constructed and tested in the EMM assembly assay (Fig. 1C)."

      1. It is unclear why the authors immediately focused on BamD rather than BamB, given that both were mentioned to mediate interaction with substrate. Was BamB also tested?

      We thank the reviewer for this comment. Following the reviewer's suggestion, we have now performed a pull-down experiment on BamB and added it to Fig. S9. We also modified the text of the results as follows.

      (Lines 262-265) "Three subunits of the BAM complex have been previously shown to interact with the substrates: BamA, BamB, and BamD (Hagan et al., 2013; Harrison, 1996; Ieva et al., 2011). In vitro pull-down assay showed that while BamA and BamD can independently bind to the in vitro translated OmpC polypeptide (Fig .S9A), BamB did not (Fig. S9B)."

      1. For the in vitro folding assays of the OmpC substrates, labeled and unlabeled, no mention of adding SurA or any other chaperone which is known to be important for mediating OMP biogenesis in vitro.

      We appreciate the reviewer’s concerns on this point, however chaperones such as SurA are non-essential factors in the OMP assembly reaction mediated by the BAM complex: the surA gene is not essential and the assembly of OMPs can be measured in the absence of exogenously added SurA. It remains possible that addition of SurA to some of these assays could be useful in detailing aspects of chaperone function in the context of the BAM complex, but that was not the intent of this study.

      1. For the supplementary document, it would be much easier for the reader to have the legends groups with the figures.

      Following the reviewer's suggestion, we have placed the legends of Supplemental Figures together with each Figure.

      1. Some of the figures and their captions are not grouped properly and are separated which makes it hard to interpret the figures efficiently.

      We thank the reviewer for this comment, we have revised the manuscript and figures to properly group the figures and captions together on a single page.

      1. The authors begin their 'Discussion' with a question (line 454), however, they don't appear to answer or even attempt to address it; suggest removing rhetorical questions.

      As per the reviewers’ suggestion, we removed this question.

      1. Line 464, 'unbiased' should be removed. This would imply that if not stated, experiments are 'negatively' biased.

      We removed this word and revised the sentence as follows:

      (Lines 431-433) "In our experimental approach to assess for inhibitory peptides, specific segments of the major porin substrate OmpC were shown to interact with the BAM complex as peptidomimetic inhibitors."

      1. Lines 466-467; '...go well beyond expected outcomes.' What does this statement mean?

      Our peptidomimetics led to unexpected results in elucidating the additional essential signal elements. The manuscript was revised as follows:

      (Lines 433-435) "Results for this experimental approach went beyond expected outcomes by identifying the essential elements of the signal Φxxxxxx[Ω/Φ]x[Ω/Φ] in β-strands other than the C-terminal strand."

      1. Line 478; '...rich information that must be oversimplified...'?

      We appreciate the reviewer’s pointed out. For more clarity, the manuscript was revised as follows:

      (Lines 450-453) "The abundance of information which arises from modeling approaches and from the multitude of candidate OMPs, is generally oversimplified when written as a primary structure description typical of the β-signal for bacterial OMPs (i.e. ζxGxx[Ω/Φ]x[Ω/Φ]) (Kutik et al., 2008)."

      1. There are typos in the supplementary figures.

      We have revised and corrected the Supplemental Figure legends.  

      Reviewer #2 (Recommendations For The Authors):

      1. In Supplementary Information, I recommend adding the figure legends directly to the corresponding figures. Currently, it is very inconvenient to go back and forth between legends and figures.

      Following the reviewer's suggestion, we have placed the legends of Supplemental Figures together with each Figure.

      1. Line 94 (p.3): "later"

      Lateral?

      Yes. We have corrected this.

      1. Line 113 (p.3): The result section, "Peptidomimetics derived from E. coli OmpC inhibit OMP assembly" Rationale of the peptide inhibition assay is not clear. How can the peptide sequence that effectively inhibit the assembly interpreted as the b-assembly signal? By competitive binding to BAM or by something else? What is the authors' hypothesis in doing this assay?

      In revision, we have added following sentence to explain the aim and design of the peptidomimetics:

      (Lines 140-145) "The addition of peptides with BAM complex affinity, such as the OMP β-signal, are capable of exerting an inhibitory effect by competing for binding of substrate OMPs to the BAM complex (Hagan et al., 2015). Thus, the addition of peptides derived from the entirety of OMPs to the EMM assembly assay, which can evaluate assembly efficiency with high accuracy, expects to identify novel regions that have affinity for the BAM complex."

      1. Line 113- (p.3) and Fig. S1: The result section, "Peptidomimetics derived from E. coli OmpC inhibit OMP assembly"

      Some explanation seems to be needed why b-barrel domain of EspP appears even without ProK?

      We appreciate the reviewer’s pointed out. We added following sentence to explain:

      (Lines 128-137) "EspP, a model OMP substrate, belongs to autotransporter family of proteins. Autotransporters have two domains; (1) a β-barrel domain, assembled into the outer membrane via the BAM complex, and (2) a passenger domain, which traverses the outer membrane via the lumen of the β-barrel domain itself and is subsequently cleaved by the correctly assembled β-barrel domain (Celik et al., 2012). When EspP is correctly assembled into outer membrane, a visible decrease in the molecular mass of the protein is observed due to the self-proteolysis. Once the barrel domain is assembled into the membrane it becomes protease-resistant, with residual unassembled and passenger domains degraded (Leyton et al., 2014; Roman-Hernandez et al., 2014)."

      1. Line 186 (p.6): "Y285"

      Y285A?

      We have corrected the error, it was Y285A.

      1. Lines 245- (p. 7)/ Lines 330- (p. 10)

      It needs to be clarified that the results described in these paragraphs were obtained from the assays with EMM.

      We appreciate the reviewer’s concerns on these points. For the first half, the following text was added at the beginning of the applicable paragraph to indicate that all of Fig. 4 is the result of the EMM assembly assay.

      (Line 241) "We further analyzed the role of internal β-signal by the EMM assembly assay. At the second half, we used purified BamD but not EMM. We described clearly with following sentence."

      (Lines 316-318) "We purified 40 different BPA variants of BamD, and then irradiated UV after incubating with 35S-labelled OmpC."

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The bacterial neurotransmitter:sodium symporter homoglogue LeuT is an well-established model system for understanding the fundamental basis for how human monoamine transporters, such as the dopamine and serotonin, couple ions with neurotransmitter uptake. Here the authors provide convincing data to show that the K+ catalyses the return step of the transport cycle in LeuT by binding to one of the two sodium sites. The paper is an important contribution, but it's still unclear exactly where K+ binds in LeuT, and how to incorporate K+ binding into a transport cycle mechanism.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript tackles an important question, namely how K+ affects substrate transport in the SLC6 family. K+ effects have previously been reported for DAT and SERT, but the prototypical SLC6fold transporter LeuT was not known to be sensitive to the K+ concentration. In this manuscript, the authors demonstrate convincingly that K+ inhibits Na+ binding, and Na+-dependent amino acid binding at high concentrations, and that K+ inside of vesicles containing LeuT increases the transport rate. However, outside K+ apparently had very little effect. Uptake data are supplemented with binding data, using the scintillation proximity assay, and transition metal FRET, allowing the observation of the distribution of distinct conformational states of the transporter.<br /> Overall, the data are of high quality. I was initially concerned about the use of solutions of very high ionic strength (the Km for K+ is in the 200 mM range), however, the authors performed good controls with lower ionic strength solutions, suggesting that the K+ effect is specific and not caused by artifacts from the high salt concentrations.

      The major issue I have with this manuscript is with the interpretation of the experimental data. Granted that the K+ effect seems to be complex. However, it seems counterintuitive that K+ competes with Na+ for the same binding site, while at the same time accelerating the transport rate. Even if K+ prevents rebinding of Na+ on the inside of vesicles, it would be expected that K+ then stabilizes this Na+-free conformation, resulting in a slowing of the transport rate. However, the opposite is found. I feel that it would be useful to perform some kinetic modeling of the transport cycle to identify a mechanism that would allow K+ to act as a competitive inhibitor of Na+ binding and rate-accelerator at the same time.

      This ties into the second point: It is not mentioned in the manuscript what the configuration of the vesicles is after LeuT reconstitution. Are they right-side out? Is LeuT distributed evenly in inside-out and right-side out orientation? Is the distribution known? If yes, how does it affect the interpretation of the uptake data with and without K+ gradient?

      Finally, mutations were only made to the Na1 cation binding site. These mutations have an effect mostly to be expected, if K+ would bind to this site. However, indirect effects of mutations can never be excluded, and the authors acknowledge this in the discussion section. It would be interesting to see the effect of K+ on a couple of mutants that are far away from Na+/substrate binding sites. This could be another piece of evidence to exclude indirect effects, if the K+ affinity is less affected.

      Reviewer #2(Public Review):

      To characterize the relationship between Na+ and K+ binding to LeuT, the effect of K+ on Na+- dependent [3 H] leucine binding was studied using a scintillation proximity assay. In the presence of K+ the apparent affinity for sodium was reduced but the maximal binding capacity for this ion was unchanged, consistent with a competitive mechanism of inhibition between Na+ and K+.

      To obtain a more direct readout of K+ binding to LeuT, tmFRET was used. This method relies on the distance-dependent quenching of a cysteine-conjugated fluorophore (FRET donor) by a transition metal (FRET acceptor). This method is a conformational readout for both ion- and ligand-binding. Along with the effect of K+ on Na+-dependent [3 H] leucine binding, the findings support the existence of a specific K+ binding site in LeuT and that K+ binding to this site induces an outward closed conformation.

      It was previously shown that in liposomes inlaid with LeuT by reconstitution, intra-vesicular K+ increases the concentrative capacity of [ 3 H] alanine. To obtain insights into the mechanistic basis of this phenomenon, purified LeuT was reconstituted into liposomes containing a variety of cations, including Na+ and K+ followed by measurements of [ 3 H] alanine uptake driven by a Na+ gradient.

      The ionic composition of the external medium was manipulated to determine if the stimulation of [3 H] alanine uptake by K+ was due to an outward directed potassium gradient serving as a driving force for sodium-dependent substrate transport by moving in the direction opposite to that of sodium and the substrate. Remarkably it was found that it is the intra-liposomal K+ per se that increases the transport rate of alanine and not a K+ gradient, suggesting that binding of K+ to the intra-cellular face of the transporter could prevent the rebinding of sodium and the substrate thereby reducing their efflux from the cell. These conclusions assume that the measured radioactive transport is via right-side-out liposomes rather than from their inverted counterparts (in case of a random orientation of the transporters in the proteoliposomes). Even though this assumption is likely to be correct, it should be tested.

      Since K+- and Na+-binding are competitive and K+ excludes substrate binding, the Authors chose to focus on the Na1 site where the carboxyl group of the substrate serves as one of the groups which coordinate the sodium ion. This was done by the introduction of conservative mutations of the amino acid residues forming the Na1 site. The potassium interaction in these mutants was monitored by sodium dependent radioactive leucine binding. Moreover, the effect the effect of Na+ with and without substrate as well as that of potassium on the conformational equilibria was measured by tmFRET measurements on the mutants introduced in the construct enabling the measurements. The results suggest that K+-binding to LeuT modulates substrate transport and that the K+ affinity and selectivity for LeuT is sensitive to mutations in the Na1 site, pointing toward the Na1 site as a candidate site for facilitating the interaction between K+ in some NSS members.

      The data presented in this manuscript are of very high quality. They are a detailed extension of results by the same group (Billesbolle et. al, Ref. 16 from the list) providing more detailed information on the importance of the Na1 site for potassium interaction. Clearly this begs for the identification of the binding site in a potassium bound LeuT structure in the future. Presumably LeuT was studied here because it appears that it is relatively easy to determine structures of many conformational states. Furthermore, convincing evidence showed that the stimulatory effect of K+ on transport is not because of energization of substrate accumulation but is rather due to the binding of this cation to a specific site.

      Reviewer #1 (Recommendations For The Authors):

      • Include a transport mechanism that can account for the K+ effects.

      We appreciate the opportunity to elaborate further regarding how we envision this complex mechanism. It is generally known that, within the LeuT-fold transporters, the return step is ratelimiting for the transport process. Our data suggests that K+ binds to the inward-facing apo form.

      Accordingly, we propose that the role of K+ binding is to facilitate LeuT to overcome the rate-limiting step. We propose the following mechanistic model: When Na+ and substrate is released to the intracellular environment the transporter must return to the outward-facing conformation. This can happen in (at least) two ways: 1) The transporter in its apo-form closes the inner gate and opens to the extracellular side, now ready to perform a new transport cycle. 2) The transporter rebinds Na+, which allows for the rebinding of substrate. It can now go in reverse (efflux) or it once again release its content. The transporter can naturally also only rebind Na+ and release it again to the cytosol.

      The purpose of K+ binding is to prevent Na+ rebinding and to promote a conformational state of the transporter, which does not allow Na+ binding. Even though Na+ has a higher affinity for the site, K+ is much more abundant.

      This model is supported by our previous experiment, showing that intravesicular K+ prevents [3H]alanine efflux while LeuT performs Na+-dependent alanine transport. Thus, the increase in Vmax could be due to a decreased efflux (exchange mode), or a facilitation of the rate-limiting step, or a combination of the two.

      Note that the model does not require that K+ is counter-transported. It just has to prevent Na+ rebinding. However, even though we failed to show K+ counter-transport, it does not mean that it does not happen. Further experiments must clarify this issue.

      To be more explicit about our proposed mechanistic model, we have expanded the last paragraph in the Discussion section. It now reads:

      “We propose that K+ binding either facilitates LeuT transition from inward- to outward-facing (the rate limiting step of the transport cycle), or solely prevents the rebinding and possible efflux of Na+ and substrate. It could also be a combination of both. Either way, intracellular K+ will lead to an increase in Vmax and concentrative capacity. Note that our previous experiment showed an increased [3H]alanine efflux when LeuT transports alanine in the absence of intra-vesicular K+16. Specifically, the mechanistic impact of K+ could be to catalyze LeuT away from the state that allows the rebinding of Na+ and substrate. This way, K+ binding would decrease the possible rebinding of intracellularly released Na+ and substrate, thereby rectifying the transport process and increase the concentrative capacity and Vmax (Figure 6). Our results suggest that K+ is not counter-transported but rather promotes LeuT to overcome an internal rate limiting energy barrier. However, further investigations must be performed before any conclusive statement can be made here.”

      • Describe the orientation of the transporter in the vesicles.

      When working with reconstituted NSS, the transport activity is determined by the Na+ gradient. This is also evident in the experiments where we dissipate the Na+ gradient. Here we find transport activity compatible to background. We can also see in the literature, that directionality is rarely determined for transport proteins in reconstituted systems. When that is said, it is difficult to know how the inside-out LeuT contribute to the transport process. Will they work in reverse and contribute to the accumulation of intravesicular [3H]alanine? If so, to what extent? They will likely not be affected by the intravesicular K+. Therefore, their possible contribution will ‘work against’ our results and decrease the apparent K+ effects reported herein. Taken together, unless the vast majority of LeuT molecules are inside-out, knowing the actual proportion will not, in our perspective, affect our interpretations and conclusions of the data.

      When that is said, we have also been curious about this issue and with the question raised by the reviewer, we performed the suggested experiment. We have inserted the results in Figure 3 – Figure supplement 1D. The figure shows that a fraction of the reconstituted LeuT are susceptible to thrombin cleavage of the accessible C-terminal. We have quantified the cleaved fraction to around 40% of the total (see Author response image 1 below). It is, however, a crude estimate since it is difficult to perform reliable dosimetry with fractions that close together. Thus, we are reluctant to add a quantitative measure in the article text.

      Author response image 1.

      We have inserted the following in the main text:

      “It is difficult to control the directionality of proteins when they are reconstituted into lipid vesicles. They will be inserted in both orientations. Outside-out and inside-out. In the case of LeuT it is the imposed Na+-gradient which is determines the directionality of transport. Uptake through the insideout transporters will probably also happen. Note that the inside-out LeuT will not have the K+ binding site exposed to the intra-vesicular environment. Accordingly, a propensity of transporters will likely not be influenced by the added K+ and will tend to mask the contribution of K+ to the transport mode from the right-side out LeuT. To investigate LeuT directionality in our reconstituted samples, we performed thrombin cleavage of accessible C-terminals on intact and perforated vesicles, respectively. The result suggests that the proportion of LeuT inserted as outside-out is larger than the proportion with an inside-out directionality (Figure 3 – Figure supplement 1D).”

      For the inserted Figure 3 – Figure supplement 1D, we have added the following legend:<br /> “(D) SDS-PAGE analysis of LeuT proteoliposomes following time-dependent thrombin digestion of accessible C-terminals (reducing the mass of LeuT by ~1.3 kDa). The reaction was terminated by the addition of PMSF at the specified time points. The lanes corresponding to the time-dependent proteolysis are flanked by lanes containing proteoliposomes without thrombin (left, 0 min) or digested in the presence of DDM (right, 180 min+DDM). Arrows indicate bands of full-length (top) and cleaved (bottom) LeuT.”

      • Check the effects of mutations away from the Na1 cation binding site.

      We have included the LeuT K398C in the study as a negative control for unspecific effects on Na+ and K+ binding. The mutant exhibit Na+ dependent [3H]leucine binding and K+-dependency similar to LeuT WT – see Table 2 and Table 2 - Figure Supplement 1G.

      As a minor point, the authors use the term "affinity" liberally. However, unless these are direct binding experiments, the term "apparent affinity" may be more appropriate, since Km values are affected by the transport cycle (in uptake), as well as binding of cations/substrate.

      We thank the reviewer for emphasizing this important point. We have revised the manuscript accordingly. We use ‘affinity’ when it has been determined under equilibrium conditions, either as a SPA binding experiment or based on tmFRET. We use the term ‘Km’ when the apparent affinity has been determined during non-equilibrium conditions such as during substrate transport.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in part 2, it is important to show the effect of internal potassium on transport in-sided liposomes. This could be done using the methodology developed by Tsai et. al. Biochemistry 51 (2012) 1557-1585.

      We appreciate this important point and have performed the suggested experiment. See reviewer 1 comment #2

      In the Abstract and throughout it is mentioned that K+ is not counter transported, yet on the bottom of p. 16 it is mentioned that this is possible.

      We have tried to be very cautious with any interpretation about whether K+ is only binding or whether it is also counter-transported. Either way, it must facilitate a transition towards a non-Na+ binding state. We tried to differentiate between the two possibilities by investigating if an outwarddirected K+ gradient alone could drive transport (Figure 3E). We do not observe any significant difference from background (no gradient). However, the gained information is rather weak: It is still possible that K+ is counter-transported, but the K+ gradient does not impose any driving force. Instead, it ensures a rectification of the Na+-dependent substrate transport. If so, this experiment would come up negative even if K+ is counter-transported.

      To be more explicit, we have changed the wording on page 16.

      Our results suggests that K+ is not counter-transported, but rather promote LeuT to overcome an internal rate limiting energy barrier. However, further investigations must be performed before any conclusive statement can be made here.

      Fig.2-Fig. Supplement 1: it is important to show that the effect of leucine is sodium-dependent by adding the control K+ and leucine.

      We thank the reviewer for suggesting this important control. We have added the experiment to Figure 2 – Figure supplement 1 as suggested. The effect is not different from K+ alone supporting the SPA-binding data that K+-binding does not promote substrate binding.

      Point for discussion: Whereas potassium is counter transported in SERT, there are conflicting interpretations on this in DAT (Ref. 15 from the list and Bhat et. al eLife (2021) 10:e67996). The situation in LeuT seems like the scenario described by Bhat et. al.

      We appreciate the suggestion for a proposed link between LeuT and hDAT. Although, as mentioned above, we find it early days to be too certain on this option. We have now mentioned the mechanistic similarity in the Discussion following our description of the proposed mechanistic model (see first request from reviewer #1):

      “If K+ is not counter-transported, LeuT might comply with the mechanism previously suggested for the human DAT31.”

      Fig. 5-Fig. Supplement 1: Why are no data on N27Q and N286Q given? If these mutants have no transport activity this should be stated. Moreover, alanine uptake by A22V is almost sodium independent and is also very fast, suggesting binding, not transport. Are the counts sensitive to ionophores like nigericin?

      We appreciate this important point. Indeed, the LeuT N27Q and N286Q are transport inactive. This information is now inserted in the main text when describing the conformational dynamics of N27QtmFRET and N286QtmFRET.

      We agree with the reviewer that the [3H]alanine uptake for A22V is not very conclusive. The vesicles with Na+ on both sides (open diamonds) do allow [3H]alanine binding. Vesicles with added gramicidin are similar in activity. The fast rate could indeed suggest a binding event. This we also do not rule out in the main text. However, the contribution in activity from LeuT A22V in vesicles with a Na+ gradient cannot be explained by a binding event alone. Then it should bind more [3H]alanine in the presence of a Na+ gradient, which is possible, but hard to imagine. Also, the alanine affinity for LeuT A22V is ~1 µM (Table 1). At this affinity it should be literally impossible to detect any binding because the off-rate is so fast that it would all dissociate during the washing procedure.

      We have described the data and left out any interpretation (e.g. changed ‘[3H]alanine transport’ to ‘[3H]alanine activity’). In addition, we have replaced: “This correlates with the lack of changes in conformational equilibrium observed in the tmFRET data between the NMDG+, Na+ and K+ states.” with: “Further investigations must clarify whether the changes in observed [3H]alanine activity constitutes a transport- or a binding event.”

      Lower part of p. 16. The Authors speculate "that the mechanistic impact of K+ binding could be to accelerate a transition away from the conformation where Na+ and substrate are released, to a state where they can no longer rebind and thus revert the transport process (efflux)". This could be easily tested by measuring exchange, which should not be influenced by potassium.

      We performed this experiment in Billesbolle et al. 2016. Nat Commun (Fig. 1f). We show that the exchange is decreased in the presence of K+. We hypothesize that this is because K+ binding forces LeuT away from the exchange mode.

    1. Author Response

      Response to the Reviews

      We are grateful for these balanced, nuanced evaluations of our work concerning the observed epistatic trends and our interpretations of their mechanistic origins. Overall, we think the reviewers have done an excellent job at recognizing the novel aspects of our findings while also discussing the caveats associated with our interpretations of the biophysical effects of these mutations. We believe it is important to consider both of these aspects of our work in order to appreciate these advances and what sorts of pertinent questions remain.

      Notably, both reviewers suggest that a lack of experimental approaches to compare the conformational properties of GnRHR variants weakens our claims. We would first humbly suggest that this constitutes a more general caveat that applies to nearly all investigations of the cellular misfolding of α-helical membrane proteins. Whether or not any current in vitro folding measurements report on conformational transitions that are relevant to cellular protein misfolding reactions remains an active area of debate (discussed further below). Nevertheless, while we concede that our structural and/ or computational evaluations of various mutagenic effects remain speculative, prevailing knowledge on the mechanisms of membrane protein folding suggest our mutations of interest (V276T and W107A) are highly unlikely to promote misfolding in precisely the same way. Thus, regardless of whether or not we were able experimentally compare the relevant folding energetics of GnRHR variants, we are confident that the distinct epistatic interactions formed by these mutations reflect variations in the misfolding mechanism and that they are distinct from the interactions that are observed in the context of stable proteins. In the following, we provide detailed considerations concerning these caveats in relation to the reviewers’ specific comments.

      Reviewer #1 (Public Review):

      The paper carries out an impressive and exhaustive non-sense mutagenesis using deep mutational scanning (DMS) of the gonadotropin-releasing hormone receptor for the WT protein and two single point mutations that I) influence TM insertion (V267T) and ii) influence protein stability (W107A), and then measures the effect of these mutants on correct plasma membrane expression (PME).

      Overall, most mutations decreased mGnRHR PME levels in all three backgrounds, indicating poor mutational tolerance under these conditions. The W107A variant wasn't really recoverable with low levels of plasma membrane localisation. For the V267T variant, most additional mutations were more deleterious than WT based on correct trafficking, indicating a synergistic effect. As one might expect, there was a higher degree of positive correlation between V267T/W107A mutants and other mutants located in TM regions, confirming that improper trafficking was a likely consequence of membrane protein co-translational folding. Nevertheless, context is important, as positive synergistic mutants in the V27T could be negative in the W107A background and vice versa. Taken together, this important study highlights the complexity of membrane protein folding in dissecting the mechanism-dependent impact of disease-causing mutations related to improper trafficking.

      Strengths

      This is a novel and exhaustive approach to dissecting how receptor mutations under different mutational backgrounds related to co-translational folding, could influence membrane protein trafficking.

      Weaknesses

      The premise for the study requires an in-depth understanding of how the single-point mutations analysed affect membrane protein folding, but the single-point mutants used seem to lack proper validation.

      Given our limited understanding of the structural properties of misfolded membrane proteins, it is unclear whether the relevant conformational effects of these mutations can be unambiguously validated using current biochemical and/ or biophysical folding assays. X-ray crystallography, cryo-EM, and NMR spectroscopy measurements have demonstrated that many purified GPCRs retain native-like structural ensembles within certain detergent micelles, bicelles, and/ or nanodiscs. However, helical membrane protein folding measurements typically require titration with denaturing detergents to promote the formation of a denatured state ensemble (DSE), which will invariably retain considerable secondary structure. Given that the solvation provided by mixed micelles is clearly distinct from that of native membranes, it remains unclear whether these DSEs represent a reasonable proxy for the misfolded conformations recognized by cellular quality control (QC, see https://doi.org/10.1021/acs.chemrev.8b00532). Thus, the use and interpretation of these systems for such purposes remains contentious in the membrane protein folding community. In addition to this theoretical issue, we are unaware of any instances in which GPCRs have been found to undergo reversible denaturation in vitro- a practical requirement for equilibrium folding measurements (https://doi.org/10.1146/annurev-biophys-051013-022926). We note that, while the resistance of GPCRs to aggregation, proteolysis, and/ or mechanical unfolding have also been probed in micelles, it is again unclear whether the associated thermal, kinetic, and/ or mechanical stability should necessarily correspond to their resistance to cotranslational and/ or posttranslational misfolding. Thus, even if we had attempted to validate the computational folding predictions employed herein, we suspect that any resulting correlations with cellular expression may have justifiably been viewed by many as circumstantial. Simply put, we know very little about the non-native conformations are generally involved in the cellular misfolding of α-helical membrane proteins, much less how to measure their relative abundance. From a philosophical standpoint, we prefer to let cells tell us what sorts of broken protein variants are degraded by their QC systems, then do our best to surmise what this tells us about the relevant properties of cellular DSEs.

      Despite this fundamental caveat, we believe that the chosen mutations and our interpretation of their relevant conformational effects are reasonably well-informed by current modeling tools and by prevailing knowledge on the physicochemical drivers of membrane protein folding and misfolding. Specifically, the mechanistic constraints of translocon-mediated membrane integration provide an understanding of the types of mutations that are likely to disrupt cotranslational folding. Though we are still learning about the protein complexes that mediate membrane translocation (https://doi.org/10.1038/s41586-022-05336-2), it is known that this underlying process is fundamentally driven by the membrane depth-dependent amino acid transfer free energies (https://doi.org/10.1146/annurev.biophys.37.032807.125904). This energetic consideration suggests introducing polar side chains near the center of a nascent TMDs should almost invariably reduce the efficiency of topogenesis. To confirm this in the context of TMD6 specifically, we utilized a well-established biochemical reporter system to confirm that V276T attenuates its translocon-mediated membrane integration (Fig. S1)- at least in the context of a chimeric protein. We also constructed a glycosylation-based topology reporter for full-length GnRHR, but ultimately found its’ in vitro expression to be insufficient to detect changes in the nascent topological ensemble. In contrast to V276T, the W107A mutation is predicted to preserve the native topological energetics of GnRHR due to its position within a soluble loop region. W107A is also unlike V276T in that it clearly disrupts tertiary interactions that stabilize the native structure. This mutation should preclude the formation of a structurally conserved hydrogen bonding network that has been observed in the context of at least 25 native GPCR structures (https://doi.org/10.7554/eLife.5489). However, without a relevant folding assay, the extent to which this network stabilizes the native GnRHR fold in cellular membranes remains unclear. Overall, we admit that these limitations have prevented us from measuring how much V276T alters the efficiency of GnRHR topogenesis, how much the W107A destabilizes the native fold, or vice versa. Nevertheless, given these design principles and the fact that both reduce the plasma membrane expression of GnRHR, as expected, we are highly confident that the structural defects generated by these mutations do, in fact, promote misfolding in their own ways. We also concede that the degree to which these mutagenic perturbations are indeed selective for specific folding processes is somewhat uncertain. However, it seems exceedingly unlikely that these mutations should disrupt topogenesis and/ or the folding of the native topomer to the exact same extent. From our perspective, this is the most important consideration with respect to the validity of the conclusions we have made in this manuscript.

      Furthermore, plasma membrane expression has been used as a proxy for incorrect membrane protein folding, but this not necessarily be the case, as even correctly folded membrane proteins may not be trafficked correctly, at least, under heterologous expression conditions. In addition, mutations can affect trafficking and potential post-translational modifications, like glycosylation.

      While the reviewer is correct that the sorting of folded proteins within the secretory pathway is generally inefficient, it is also true that the maturation of nascent proteins within the ER generally bottlenecks the plasma membrane expression of most α-helical membrane proteins. Our group and several others have demonstrated that the efficiency of ER export generally appears to scale with the propensity of membrane proteins to achieve their correct topology and/ or to achieve their native fold (see https://doi.org/10.1021/jacs.5b03743 and https://doi.org/10.1021/jacs.8b08243). Notably, these investigations all involved proteins that contain native glycosylation and various other post-translational modification sites. While we cannot rule out that certain specific combinations of mutations may alter expression through their perturbation of post-translational GnRHR modifications, we feel confident that the general trends we have observed across hundreds of variants predominantly reflect changes in folding and cellular QC. This interpretation is supported by the relationship between observed trends in variant expression and Rosetta-based stability calculations, which we identified using unbiased unsupervised machine learning approaches (compare Figs. 6B & 6D).

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Chamness and colleagues make a pioneering effort to map epistatic interactions among mutations in a membrane protein. They introduce thousands of mutations to the mouse GnRH Receptor (GnRHR), either under wild-type background or two mutant backgrounds, representing mutations that destabilize GnRHR by distinct mechanisms. The first mutant background is W107A, destabilizing the tertiary fold, and the second, V276T, perturbing the efficiency of cotranslational insertion of TM6 to the membrane, which is essential for proper folding. They then measure the surface expression of these three mutant libraries, using it as a proxy for protein stability, since misfolded proteins do not typically make it to the plasma membrane. The resulting dataset is then used to shed light on how diverse mutations interact epistatically with the two genetic background mutations. Their main conclusion is that epistatic interactions vary depending on the degree of destabilization and the mechanism through which they perturb the protein. The mutation V276T forms primarily negative (aggravating) epistatic interactions with many mutations, as is common to destabilizing mutations in soluble proteins. Surprisingly, W107A forms many positive (alleviating) epistatic interactions with other mutations. They further show that the locations of secondary mutations correlate with the types of epistatic interactions they form with the above two mutants.

      Strengths:

      Such a high throughput study for epistasis in membrane proteins is pioneering, and the results are indeed illuminating. Examples of interesting findings are that: (1) No single mutation can dramatically rescue the destabilization introduced by W107A. (2) Epistasis with a secondary mutation is strongly influenced by the degree of destabilization introduced by the primary mutation. (3) Misfolding caused by mis-insertion tends to be aggravated by further mutations. The discussion of how protein folding energetics affects epistasis (Fig. 7) makes a lot of sense and lays out an interesting biophysical framework for the findings.

      Weaknesses:

      The major weakness comes from the potential limitations in the measurements of surface expression of severely misfolded mutants. This point is discussed quite fairly in the paper, in statements like "the W107A variant already exhibits marginal surface immunostaining" and many others. It seems that only about 5% of the W107A makes it to the plasma membrane compared to wild-type (Figures 2 and 3). This might be a low starting point from which to accurately measure the effects of secondary mutations.

      The reviewer raises an excellent point that we considered at length during the analysis of these data and the preparation of the manuscript. Though we remain confident in the integrity of these measurements and the corresponding analyses, we now realize this aspect of the data merits further discussion and documentation in our forthcoming revision, in which we will outline the following specific lines of reasoning.

      Still, the authors claim that measurements of W107A double mutants "still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants". I was not entirely convinced that this was true.

      We made this statement based on the simple observation that the surface immunostaining intensities across the population of recombinant cells expressing the library of W107A double mutants was consistently broader than that of recombinant cells expressing W107A GnRHR alone (see Author response image 1 for reference). Given that the recombinant cellular library represents a mix of cells expressing ~1600 individual variants that are each present at low abundance, the pronounced tails within this distribution presumably represent the composite staining of many small cellular subpopulations that express collections of variants that deviate from the expression of W107A to an extent that is significant enough to be visible on a log intensity plot.

      Author response image 1.

      Firstly, I think it would be important to test how much noise these measurements have and how much surface immunostaining the W107A mutant displays above the background of cells that do not express the protein at all.

      For reference, the average surface immunostaining intensity of HEK293T cells transiently expressing W107A GnRHR was 2.2-fold higher than that of the IRES-eGFP negative, untransfected cells within the same sample- the WT immunostaining intensity was 9.5-fold over background by comparison. Similarly, recombinant HEK293T cells expressing the W107A double mutant library had an average surface immunostaining intensity that was 2.6-fold over background across the two DMS trials. Thus, while the surface immunostaining of this variant is certainly diminished, we were still able to reliably detect W107A at the plasma membrane even under distinct expression regimes. We will include these and other signal-to-noise metrics for each experiment in a new table in the revised version of this manuscript.

      Beyond considerations related to intensity, we also previously noticed the relative intensity values for W107A double mutants exhibited considerable precision across our two biological replicates. If signal were too poor to detect changes in variant expression, we would have expected a plot of the intensity values across these two replicates to form a scatter. Instead, we found DMS intensity values for individual variants to be highly correlated from one replicate to the next (Pearson’s R= 0.97, see Author response image 2 for reference). This observation empirically demonstrates that this assay consistently differentiated between variants that exhibit slightly enhanced immunostaining from those that have even lower immunostaining than W107A GnRHR.

      Author response image 2.

      But more importantly, it is not clear if under this regimen surface expression still reports on stability/protein fitness. It is unknown if the W107A retains any function or folding at all. For example, it is possible that the low amount of surface protein represents misfolded receptors that escaped the ER quality control.

      While we believe that such questions are outside the scope of this work, we certainly agree that it is entirely possible that some of these variants bypass QC without achieving their native fold. This topic is quite interesting to us but is quite challenging to assess in the context of GPCRs, which have complex fitness landscapes that involve their propensity to distinguish between different ligands, engage specific components associated with divergent downstream signaling pathways, and navigate between endocytic recycling/ degradation pathways following activation. In light of the inherent complexity of GPCR function, we humbly suggest our choice of a relatively simple property of an otherwise complex protein may be viewed as a virtue rather than a shortcoming. Protein fitness is typically cast as the product of abundance and activity. Rather than measuring an oversimplified, composite fitness metric, we focused on one variable (plasma membrane expression) and its dominant effector (folding). We believe restraining the scope in this manner was key for the elucidation of clear mechanistic insights.

      The differential clustering of epistatic mutations (Fig. 6) provides some interesting insights as to the rules that dictate epistasis, but these too are dominated by the magnitude of destabilization caused by one of the mutations. In this case, the secondary mutations that had the most interesting epistasis were exceedingly destabilizing. With this in mind, it is hard to interpret the results that emerge regarding the epistatic interactions of W107A. Furthermore, the most significant positive epistasis is observed when W107A is combined with additional mutations that almost completely abolish surface expression. It is likely that either mutation destabilizes the protein beyond repair. Therefore, what we can learn from the fact that such mutations have positive epistasis is not clear to me. Based on this, I am not sure that another mutation that disrupts the tertiary folding more mildly would not yield different results. With that said, I believe that the results regarding the epistasis of V276T with other mutations are strong and very interesting on their own.

      We agree with the reviewer. In light of our results we believe it is virtually certain that the secondary mutations characterized herein would be likely to form distinct epistatic interactions with mutations that are only mildly destabilizing. Indeed, this insight reflects one of the key takeaway messages from this work- stability-mediated epistasis is difficult to generalize because it should depend on the extent to which each mutation changes the stability (ΔΔG) as well as initial stability of the WT/ reference sequence (ΔG, see Figure 7). Frankly, we are not so sure we would have pieced this together as clearly had we not had the fortune (or misfortune?) of including such a destructive mutation like W107A as a point of reference.

      Additionally, the study draws general conclusions from the characterization of only two mutations, W107A and V276T. At this point, it is hard to know if other mutations that perturb insertion or tertiary folding would behave similarly. This should be emphasized in the text.

      We agree and will be sure to emphasize this point in the revised manuscript.

      Some statistical aspects of the study could be improved:

      1. It would be nice to see the level of reproducibility of the biological replicates in a plot, such as scatter or similar, with correlation values that give a sense of the noise level of the measurements. This should be done before filtering out the inconsistent data.

      We thank the reviewer for this suggestion and will include scatters for each genetic background like the one shown above in the supplement of the revised version of the manuscript.

      1. The statements "Variants bearing mutations within the C- terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E)." and "In contrast, mutations that are 210 better tolerated in the context of W107A mGnRHR are located 211 throughout the structure but are particularly abundant among residues 212 in the middle of the primary structure that form TMD4, ICL2, and ECL2 213 (Fig. 4 C & F)." are both hard to judge. Inspecting Figures 4B and C does not immediately show these trends, and importantly, a solid statistical test is missing here. In Figures 4E and F the locations of the different loops and TMs are not indicated on the structure, making these statements hard to judge.

      We apologize for this oversight and thank the reviewer for pointing this out. We will include additional statistical tests to reinforce these conclusions in the revised version of the manuscript.

      1. The following statement lacks a statistical test: "Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD)." Is this enrichment significant? Further in the same paragraph, the claim that "In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants". Needs to be backed by relevant data and statistics, or at least a reference.

      We will include additional statistical tests for this in the revised manuscript and will ensure the language we use is consistent with the strength of the indicated statistical enrichment.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for organizing the reviews for our manuscript: Behavioral entrainment to rhythmic auditory stimulation can be modulated by tACS depending on the electrical stimulation field properties,” and for the positive eLife assessment. We also thank the reviewers for their constructive comments. We have addressed every comment, which has helped to improve the transparency and readability of the manuscript. The main changes to the manuscript are summarized as follows:

      1. Surrogate distributions were created for each participant and session to estimate the effect of tACS-phase lag on behavioral entrainment to the sound that could have occurred by chance or because of our analysis method (R1). The actual tACS-amplitude effects were normalized relative to the surrogate distribution, and statistical analysis was performed on the normalized (z-score) values. This analysis did not change our main outcome: that tACS modulates behavioral entrainment to the sound depending on the phase lag between the auditory and the electrical signals. This analysis has now been incorporated into the Results section and in Fig. 3c-d.

      2. Two additional supplemental figures were created to include the single-participant data related to Fig. 3b and 3e (R2).

      3. Additional editing of the manuscript has been performed to improve the readability.

      Below, you will find a point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      We are grateful for the reviewer’s positive assessment of the potential impact of our study. The reviewer’s primary concerns were 1) the tACS lag effects reported in the manuscript might be noise because of the realignment procedure, and 2) no multiple comparisons correction was conducted in the model comparison procedure.

      In response to point 1), we have reanalyzed the data in exactly the manner prescribed by the reviewer. Our effects remain, and the new control analysis strengthens the manuscript. 2) In the context of model comparison, the model selection procedure was not based on evaluating the statistical significance of any model or predictor. Instead, the single model that best fit the data was selected as the model with the lowest Akaike’s information criterion (AIC), and its superiority relative to the second-best model was corroborated using the likelihood ratio test. Only the best model was evaluated for significance and analyzed in terms of its predictors and interactions. This model is an omnibus test and does not require multiple comparison correction unless there are posthoc decompositions. For similar approaches, see (Kasten et al., 2019).

      Below, we have responded to each comment specifically or referred to this general comment.

      Summary of what the authors were trying to achieve.

      This paper studies the possible effects of tACS on the detection of silence gaps in an FM-modulated noise stimulus. Both FM modulation of the sound and the tACS are at 2Hz, and the phase of the two is varied to determine possible interactions between the auditory and electric stimulation. Additionally, two different electrode montages are used to determine if variation in electric field distribution across the brain may be related to the effects of tACS on behavioral performance in individual subjects.

      Major strengths and weaknesses of the methods and results.

      The study appears to be well-powered to detect modulation of behavioral performance with N=42 subjects. There is a clear and reproducible modulation of behavioral effects with the phase of the FM sound modulation. The study was also well designed, combining fMRI, current flow modeling, montage optimization targeting, and behavioral analysis. A particular merit of this study is to have repeated the sessions for most subjects in order to test repeat-reliability, which is so often missing in human experiments. The results and methods are generally well-described and well-conceived. The portion of the analysis related to behavior alone is excellent. The analysis of the tACS results is also generally well described, candidly highlighting how variable results are across subjects and sessions. The figures are all of high quality and clear. One weakness of the experimental design is that no effort was made to control for sensation effects. tACS at 2Hz causes prominent skin sensations which could have interacted with auditory perception and thus, detection performance.

      The reviewer is right that we did not control for the sensation effects in our paradigm. We asked the participants to rate the strength of the perceived stimulation after each run. However, this information was used only to assess the safety and tolerability of the stimulation protocol. Nevertheless, we did not consider controlling for skin sensations necessary given the within-participant nature of our design (all participants experienced all six tACS–audio phase lag conditions, which were identical in their potential to cause physical sensations; the only difference between conditions was related to the timing of the auditory stimulus). That is, while the reviewer is right that 2-Hz tACS can indeed induce skin sensation under the electrodes, in this study, we report the effects that depend on the tACS-phase lag relative to the FM-stimulus. Note that the starting phase of the FM-stimulus was randomized across trials within each block (all six tACS audio lags were presented in each block of stimulation). We have no reason to expect the skin sensation to change with the tACS-audio lag from trial to trial, and therefore do not consider this to be a confound in our design. We have added some sentences with this information to the Discussion section:

      Pages 16-17, lines 497-504: “Note that we did not control for the skin sensation induced by 2-Hz tACS in this experiment. Participants rated the strength of the perceived stimulation after each run. However, this information was used only to assess the safety and tolerability of the stimulation protocol. It is in principle possible that skin sensation would depend on tACS phase itself. However, in this study, we report effects that depend on the relationship between tACS-phase and FM-stimulus phase, which changed from trial to trial as the starting phase of the FM-stimulus was randomized across trials. We have no reason to expect the skin sensation to change with the tACS-audio lag and therefore do not consider this to be a confound in our data.”

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      Unfortunately, the main effects described for tACS are encumbered by a lack of clarity in the analysis. It does appear that the tACS effects reported here could be an artifact of the analysis approach. Without further clarification, the main findings on the tACS effects may not be supported by the data.

      Likely impact of the work on the field, and the utility of the methods and data to the community.

      The central claim is that tACS modulates behavioral detection performance across the 0.5s cycle of stimulation. However, neither the phase nor the strength of this effect reproduces across subjects or sessions. Some of these individual variations may be explainable by individual current distribution. If these results hold, they could be of interest to investigators in the tACS field.

      The additional context you think would help readers interpret or understand the significance of the work.

      The following are more detailed comments on specific sections of the paper, including details on the concerns with the statistical analysis of the tACS effects.

      The introduction is well-balanced, discussing the promise and limitations of previous results with tACS. The objectives are well-defined.

      The analysis surrounding behavioral performance and its dependence on the phase of the FM modulation (Figure 3) is masterfully executed and explained. It appears that it reproduces previous studies and points to a very robust behavioral task that may be of use in other studies.

      Again, we would like to thank the reviewer for the positive assessment of the potential impact of our work and for the thoughtful comments regarding the methodology. For readability in our responses, we have numbered the comments below.

      1. There is a definition of tACS(+) vs tACS(-) based on the relative phase of tACS that may be problematic for the subsequent analysis of Figures 4 and 5. It seems that phase 0 is adjusted to each subject/session. For argument's sake, let's assume the curves in Fig. 3E are random fluctuations. Then aligning them to best-fitting cosine will trivially generate a FM-amplitude fluctuation with cosine shape as shown in Fig. 4a. Selecting the positive and negative phase of that will trivially be larger and smaller than a sham, respectively, as shown in Fig 4b. If this is correct, and the authors would like to keep this way of showing results, then one would need to demonstrate that this difference is larger than expected by chance. Perhaps one could randomize the 6 phase bins in each subject/session and execute the same process (fit a cosine to curves 3e, realign as in 4a, and summarize as in 4b). That will give a distribution under the Null, which may be used to determine if the contrast currently shown in 4b is indeed statistically significant.

      We agree with the reviewer’s concerns regarding the possible bias induced by the realignment procedure used to estimate tACS effects. Certainly, when adjusting phase 0 to each participant/session’s best tACS phase (peak in the fitting cosine), selecting the positive phase of the realigned data will be trivially larger than sham (Fig. 4a). This is why the realigned zero-phase and opposite phase (trough) bins were excluded from the analysis in Fig. 4b. Therefore, tACS(+) vs. tACS(-) do not represent behavioral entrainment at the peak positive and negative tACS lags, as both bins were already removed from the analysis. tACS(+) and tACS(-) are the averages of two adjacent bins from the positive and negative tACS lags, respectively (Zoefel et al., 2019). Such an analysis relies on the idea that if the effect of tACS is sinusoidal, presenting the auditory stimulus at the positive half cycle should be different than when the auditory stimulus lags the electrical signal by the other half. If the effect of tACS was just random noise fluctuations, there is no reason to assume that such fluctuations would be sinusoidal; therefore, any bias in estimating the effect of tACS should be removed when excluding the peak to which the individual data were realigned. Similar analytical procedures have been used previously in the literature (Riecke et al., 2015; Riecke et al., 2018). We have modified the colors in Fig. 4a and 4c (former 4b) and added a new panel to the figure (new 4b) to make the realignment procedure, including the exclusion of the realigned peak and trough data, more visually obvious.

      Moreover, we very much like the reviewer’s suggestion to normalize the magnitude of the tACS effect using a permutation strategy. We performed additional analyses to normalize our tACS effect in Fig. 4c by the probability of obtaining the effect by chance. For each subject and session, tACS-phase lags were randomized across trials for a total of 1000 iterations. For each iteration, the gaps were binned by the FM-stimulus phase and tACS-lag. For each tACS-lag, the amplitude of behavioral entrainment to the FM-stimulus was estimated (FM-amplitude), as shown in Fig. 3. Similar to the original data, a second cosine fit was estimated for the FM-amplitude by tACS-lag. Optimal tACS-phase was estimated from the cosine fit and FM-amplitude values were realigned. Again, the realigned phase 0 and trough were removed from the analysis, and their adjacent bins were averaged to obtain the FM-amplitude at tACS(+) and tACS(−), as shown in Fig. 4c. We then computed the difference between 1) tACS(+) and sham, 2) tACS(-) and sham, and 3) tACS(+) and tACS (-), for the original data and the permuted datasets. This procedure was performed for each participant and session to estimate the size of the tACS effect for the original and surrogate data. The original tACS effects were transformed to z-scores using surrogate distributions, providing us with an estimate of the size of the real effect relative to chance. We then computed one-sample t-tests to compare whether the effects of tACS were statistically significant. In fact, this analysis showed that the tACS effects were still statistically significant. This analysis has been added to the Results and Methods sections and is included in Figure 4d.

      Page 10, lines 282-297: “In order to further investigate whether the observed tACS effect was significantly larger than chance and not an artifact of our analysis procedure (33), we created 1000 surrogate datasets per participant and session by permuting the tACS lag designation across trials. The same binning procedure, realignment, and cosine fits were applied to each surrogate dataset as for the original data. This yielded a surrogate distribution of tACS(+) and tACS(-) values for each participant and session. These values were averaged across sessions since the original analysis did not show a main effect of session. We then computed the difference between tACS(+) and sham, tACS(-) and sham, and tACS(+) and tACS(-), separately for the original and surrogate datasets. The obtained difference for the original data where then z-scored using the mean and standard deviation of the surrogate distribution. Note that in this case we used data of all 42 participants who had at least one valid session (37 participants with both sessions). Three one-sample t-tests were conducted to investigate whether the size of the tACS effect obtained in the original data was significantly larger than that obtained by chance (Fig. 4d). This analysis showed that all z-scores were significantly higher than zero (all t(41) > 2.36, p < 0.05, all p-values corrected for multiple comparisons using the Holm-Bonferroni method).”

      Page 31, lines 962-972: “To further control that the observed tACS effects were not an artifact of the analysis procedure, the difference between the tACS conditions (sham, tACS(+), and tACS(-)) were normalized using a permutation approach. For each participant and session, 1000 surrogate datasets were created by permuting the tACS lag designation across trials. The same binning procedure, realignment, and cosine fits were applied to each surrogate dataset as for the original data (see above). FM-amplitude at sham, tACS(+) and tACS(-) were averaged across sessions since the original analysis did not show a main effect of session. Difference between tACS conditions were estimated for the original and surrogate datasets and the resulting values from the original data were z-scored using the mean and standard deviation from the surrogate distributions. One-sample t-tests were conducted to test the statistical significance of the z-scores. P-values were corrected for multiple comparisons using the Holm-Bonferroni method.”

      1. Results of Fig 5a and 5b seem consistent with the concern raised above about the results of Fig. 4. It appears we are looking at an artifact of the realignment procedure, on otherwise random noise. In fact, the drop in "tACS-amplitude" in Fig. 5c is entirely consistent with a random noise effect.

      Please see our response to the comment above.

      1. To better understand what factors might be influencing inter-session variability in tACS effects, we estimated multiple linear models ..." this post hoc analysis does not seem to have been corrected for multiple comparisons of these "multiple linear models". It is not clear how many different things were tried. The fact that one of them has a p-value of 0.007 for some factors with amplitude-difference, but these factors did not play a role in the amplitude-phase, suggests again that we are not looking at a lawful behavior in these data.

      We suspect that the reviewer did not have access to the supplemental materials where all tables (relevant here is Table S3) are provided. This post hoc analysis was performed as an exploratory analysis to better understand the factors that could influence the inter-session variability of tACS effects. In Table S3, we provide the formula for each of the seven models tested, including their Akaike information criteria corrected for small samples (AICc), R2, F, and p-values. As described in the methods section, the winning model was selected as the model with the smallest AICc. A similar procedure has been previously used in the literature (Kasten et al., 2019). Moreover, to ensure that our winning model was better at explaining the data than the second-best unrestricted model, we used the likelihood ratio test. After choosing the winning model and before reporting the significance of the predictors, we examined the significance of the model in and of itself, taking into account its R2 as well as F- and p-values relative to a constant model. Thus, only one model is being evaluated in terms of statistical significance. Therefore, to our understanding, there are no multiple comparisons to correct for. We added the information regarding the selection procedure, hoping this will make the analysis clearer.

      See page 12, lines 354-360: “This model was selected because it had the smallest Akaike’s information criterion (corrected for small samples), AICc. Moreover, the likelihood ratio test showed no evidence for choosing the more complex unrestricted model (stat = 2.411, p = 0.121). Following the same selection criteria, the winning model predicting inter-session variability in tACS-phase, included only the factor gender (Table S4). However, this model was not significant in and of itself when compared to a constant model (F-statistic vs. constant model: 3.05, p = 0.09, R2 = 0.082).”

      1. "So far, our results demonstrate that FM-stimulus driven behavioral modulation of gap detection (FM-amplitude) was significantly affected by the phase lag between the FM-stimulus and the tACS signal (Audio-tACS lag) ..." There appears to be nothing in the preceding section (Figures 4 and 5) to show that the modulation seen in 3e is not just noise. Maybe something can be said about 3b on an individual subject/session basis that makes these results statistically significant on their own. Maybe these modulations are strong and statistically significant, but just not reproducible across subjects and sessions?

      Please see our response to the first comment regarding the validity of our analysis for proving the significant effect of tACS lag on modulating behavioral entrainment to the FM-stimulus (FM-amplitude), and the new control analysis. After performing the permutation tests, to make sure the reported effects are not noise, our statistical analysis still shows that tACS-lag does significantly modulate behavioral entrainment to the sound (FM-amplitude). Thus, the reviewer is right to say “these modulations are strong and statistically significant, just not reproducible across subjects and sessions”. In this regard, we consider our evaluation of session-to-session reliability of tACS effects is of high relevance for the field, as this is often overlooked in the literature.

      1. "Inter-individual variability in the simulated E-field predicts tACS effects" Authors here are attempting to predict a property of the subjects that was just shown to not be a reliable property of the subject. Authors are picking 9 possible features for this, testing 33 possible models with N=34 data points. With these circumstances, it is not hard to find something that correlates by chance. And some of the models tested had interaction terms, possibly further increasing the number of comparisons. The results reported in this section do not seem to be robust, unless all this was corrected for multiple comparisons, and it was not made clear?

      We thank the reviewer very much for this comment. While the reviewer is right that in these models, we are trying to predict an individual property (tACS-amplitude) that was not test–retest reliable across sessions, we still consider this to be a valid analysis. Here, we take the tACS-amplitude averaged across sessions, trying to predict the probability of a participant to be significantly modulated by tACS, in general, regardless of day-to-day variability. Regarding the number of multiple regression models, how we chose the winning model and the appropriateness/need of multiple-comparisons correction in this case, please see our explanation under “Reviewer 1 (Public review)” and our response to comment 3.

      1. "Can we reduce inter-individual variability in tACS effects ..." This section seems even more speculative and with mixed results.

      We agree with the reviewer that this section is a bit speculative. We are trying to plant some seeds for future research can help move the field forward in the quest for better stimulation protocols. We have added a sentence at the end of the section to explicitly say that more evidence is needed in this regard.

      Page 14, lines 428-429: “At this stage, more evidence is needed to prove the superiority of individually optimized tACS montages for reducing inter-individual variability in tACS effects.”

      Given the concerns with the statistical analysis above, there are concerns about the following statements in the summary of the Discussion:

      1. "2) does modulate the amplitude of the FM-stimulus induced behavioral modulation (FM-amplitude)"

      This seems to be based on Figure 4, which leaves one with significant concerns.

      Please see response to comment 1. We hope the reviewer is satisfied with our additional analysis to make sure the effect of tACS here reported is not noise.

      1. "4) individual variability in tACS effect size was partially explained by two interactions: between the normal component of the E-field and the field focality, and between the normal component of the E-field and the distance between the peak of the electric field and the functional target ROIs."

      The complexity of this statement alone may be a good indication that this could be the result of false discovery due to multiple comparisons.

      We respectfully disagree with the reviewer’s opinion that this is a complex statement. We think that these interaction effects are very intuitive as we explain in the results and discussion sections. These significant interactions show that for tACS to be effective, it matters that current gets to the right place and not to irrelevant brain regions. We believe this finding is of great importance for the field, since most studies on the topic still focus mostly on predicting tACS effects from the absolute field strength and neglect other properties of the electric field.

      For the same reasons as stated above, the following statements in the Abstract do not appear to have adequate support in the data:

      "We observed that tACS modulated the strength of behavioral entrainment to the FM sound in a phase-lag specific manner. ... Inter-individual variability of tACS effects was best explained by the strength of the inward electric field, depending on the field focality and proximity to the target brain region. Spatially optimizing the electrode montage reduced inter-individual variability compared to a standard montage group."

      Please see response to all previous comments

      In particular, the evidence in support of the last sentence is unclear. The only finding that seems related is that "the variance test was significant only for tACS(-) in session 2". This is a very narrow result to be able to make such a general statement in the Abstract. But perhaps this can be made clearer.

      We changed this sentence in the abstract to:

      Page 2, lines 41-43: “Although additional evidence is necessary, our results also provided suggestive insights that spatially optimizing the electrode montage could be a promising tool to reduce inter-individual variability of tACS effects.”

      Reviewer #3 (Public Review):

      In "Behavioral entrainment to rhythmic auditory stimulation can be modulated by tACS depending on the electrical stimulation field properties" Cabral-Calderin and collaborators aimed to document 1) the possible advantages of personalized tACS montage over standard montage on modulating behavior; 2) the inter-individual and inter-session reliability of tACS effects on behavioral entrainment and, 3) the importance of the induced electric field properties on the inter-individual variability of tACS.

      To do so, in two different sessions, they investigated how the detection of silent gaps occurring at random phases of a 2Hz- amplitude modulated sound could be enhanced with 2Hz tACS, delivered at different phase lags. In addition, they evaluated the advantage of using spatially optimized tACS montages (information-based procedure - using anatomy and functional MRI to define the target ROI and simulation to compare to a standard montage applied to all participants) on behavioral entrainment. They first show that the optimized and the standard montages have similar spatial overlap to the target ROI. While the optimized montage induced a more focal field compared to the standard montage, the latter induced the strongest electric field. Second, they show that tACS does not modify the optimal phase for gap detection (phase of the frequency-modulated sound) but modulates the strength of behavioral entrainment to the frequency-modulated sound in a phase-lag specific manner. However, and surprisingly, they report that the optimal tACS lag, and the magnitude of the phasic tACS effect were highly variable across sessions. Finally, they report that the inter-individual variability of tACS effects can be explained by the strength of the inward electric field as a function of the field focality and on how well it reached the target ROI.

      The article is interesting and well-written, and the methods and approaches are state-of-the-art.

      Strengths:

      • The information-based approach used by the authors is very strong, notably with the definition of subject-specific targets using a fMRI localizer and the simulation of electric field strength using 3 different tACS montages (only 2 montages used for the behavioral experiment).

      • The inter-session and inter-individual variability are well documented and discussed. This article will probably guide future studies in the field.

      Weaknesses:

      • The addition of simultaneous EEG recording would have been beneficial to understand the relationship between tACS entrainment and the entrainment to rhythmic auditory stimulation.

      We are grateful for the Reviewer’s positive assessment of our work and for the reviewer’s recommendations. We agree with the reviewer that adding simultaneous EEG or MEG to our design would have been beneficial to understand tACS effects. However, as the reviewer might be familiar with, such combination also possesses additional challenges due to the strong artifacts induced by tACS in the EEG signals, which is at the frequency of interest and several orders of magnitude higher than the signal of interest. Unfortunately, the adequate setup for simultaneous tACS-EEG was not available at the moment of the study. Nevertheless, since we are using a paradigm that we have repeatedly studied in the past and have shown it entrains neural activity and modulates behavior rhythmically, we are confident our results are of interest on their own. For readability of our answers, we numbered to comments below.

      1. It would have been interesting to develop the fact that tACS did not "overwrite" neural entrainment to the auditory stimulus. The authors try to explain this effect by mentioning that "tACS is most effective at modulating oscillatory activity at the intended frequency when its power is not too high" or "tACS imposes its own rhythm on spiking activity when tACS strength is stronger than the endogenous oscillations but it decreases rhythmic spiking when tACS strength is weaker than the endogenous oscillations". However, it is relevant to note that the oscillations in their study are by definition "not endogenous" and one can interpret their results as a clear superiority of sensory entrainment over tACS entrainment. This potential superiority should be discussed, documented, and developed.

      We thank the reviewer very much for this remark. We completely agree that our results could be interpreted as a clear superiority of sensory entrainment over tACS entrainment. We have now incorporated this possibility in the discussion.

      Page 16, line 472-478: “Alternatively, our results could simply be interpreted as a clear superiority of the auditory stimulus for entrainment. In other words, sensory entrainment might just be stronger than tACS entrainment in this case where the stimulus rhythm was strong and salient. It would be interesting to further test whether this superiority of sensory entrainment applies to all sensory modalities or if there is a particular advantage for auditory stimuli when they compete with electrical stimulation. However, answering this question was beyond the scope of our study and needs further investigations with more appropriate paradigms.”

      1. The authors propose that "by applying tACS at the right lag relative to auditory rhythms, we can aid how the brain synchronizes to the sounds and in turn modulate behavior." This should be developed as the authors showed that the tACS lags are highly variable across sessions. According to their results, the optimal lag will vary for each tACS session and subtle changes in the montage could affect the effects.

      We thank the reviewer for this remark. We believe that the right procedure in this case would be using close-loop protocols where the optimal tACS-lag is estimated online as we discuss in the summary and future directions sub-section. We tried to make this clearer in the same sentence that the reviewer mentioned.

      Page 17, line 506-508: “Since optimal tACS phase was variable across participants and sessions, this approach would require closed-loop protocols where the optimal tACS lag is estimated online (see next section).”

      1. In a related vein, it would be very useful to show the data presented in Figure 3 (panels b,d,e) for all participants to allow the reader to evaluate the quality of the data (this can be added as a supplementary figure).

      Thank you very much for the suggestion. We have added two new supplemental figures (Fig S1 and S2) to show individual data for Fig. 3b and 3e. Note that Fig. 3d already shows the individual data as each circle represents optimal FM-phase for a single participant.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      "was optimized in SimNIBS to focus the electric field as precisely as possible at the target ROI" It appears that some form of constrained optimization was used. It would be good to clarify which method was used, including a reference.

      Indeed, SimNIBS implements a constrained optimization approach based on pre-calculated lead fields. We have added the corresponding reference. All parameters used for the optimization are reported in the methods (see sub-section Electric field simulations and montage optimization). Regarding further specifics, the readers are invited to check the MATLAB code that was used for the optimization which is made available at: https://osf.io/3yutb

      "Thus, each montage has its pros and cons, and the choice of montage will depend on which of these dependent measures is prioritized." Well put. It would be interesting to know if authors considered optimizing for intensity on target. That would give the strongest predicted intensity on target, which seems like an important desideratum. Individualizing for something focal, as expected, did not give the strongest intensity. In fact, the method struggled to achieve the desired intensity of 0.1V/m in some subjects. It would be interesting to have a discussion about why this particular optimization method was selected.

      The specific optimization method used in this study was somewhat arbitrary, as there is no standard in the field. It was validated in prior studies, where it was also demonstrated that it performs favorably compared to alternative methods (Saturnino et al., 2019; Saturnino et al., 2021). The underlying physics of the head volume conductor generally limits the maximally achievable focality, and requires a tradeoff between focality and the desired intensity in the target. This tradeoff depends on the maximal amount of current that can be injected into the electrodes due to safety limits (4 mA in total in our case). Further constraints of the optimization in our application were the simultaneous targeting of two areas, and achieving field directions in the targets roughly parallel to those of auditory dipoles. Given the combination of these constraints, as the reviewer noticed, we could not even achieve the desired intensity of .1V/m in some subjects. As we wanted to stimulate both auditory cortices equally, our priority was to have the E-fields as similar as possible between hemispheres. Future studies optimizing for only one target would be easier to optimize for target intensity (assuming the same maximal total current injection). Alternatively, relaxing the constraint on direction and optimizing only for field intensity would help to increase the field intensities in the targets, but would lead to differing field directions in the two targets. As an example, see Rev. Fig.1 below. We extensively discuss some of these points in the discussion section: “Are individually optimized tACS montage better?” (Pages 21-22).

      Additionally, we added a few sentences in the Results and Methods giving more details about the optimization approach.

      Page 5, lines 115-116: “Using individual finite element method (FEM) head models (see Methods) and the lead field-based constrained optimization approach implemented in SimNIBS (31)”

      Page 27, lines 819-822: “The optimization pipeline employed the approach described in (31) and was performed in two steps. First, a lead field matrix was created per individual using the 10-10 EEG virtual cap provided in SimNIBS and performing electric field simulations based on the default tissue conductivities listed below.”

      Author response image 1.

      E-field distributions for one example participant. Brain maps show the results from the same optimization procedure described in the main manuscript but with no constraint for the current direction (top) or constraining the current direction (bottom). Note that the desired intensity of .1 V/m can be achieved when the current direction is not constrained.

      The terminology of "high-definition HD" used here is unconventional and may confuse some readers. The paper cited for ring electrodes (18) does not refer to it as HD. A quick search for high-definition HD yields mostly papers using many small electrodes, not ring electrodes. They look more like what was called "individualized". More conventional would be to call the first configuration a "ring-electrode", and the "individualized" configuration might be called "individualized HD".

      We thank the reviewer for this remark. We changed the label of the high-definition montage to ring-electrode. Regarding the individualized configuration, we prefer not to use individualized HD as it has the same number of electrodes as the standard montage.

      "So far, we have evaluated whether tACS at different phase lags interferes with stimulus-brain synchrony and modulates behavioral signatures of entrainment" The paper does not present any data on stimulus-brain synchrony. There is only an analysis of behavior and stimulus/tACS phase.

      We agree with the reviewer. To be more careful with such statement we now modified the sentence to say:

      Page 10, lines 303-304: “So far, we have evaluated whether tACS at different phase lags modulates behavioral signatures of entrainment: FM-amplitude and FM-phase.”

      "However, the strength of the tACS effect was variable across participants." and across sessions, and the phase also was variable across subjects and sessions.

      "tACS-amplitude estimates were averaged across sessions since the session did not significantly affect FM-amplitude (Fig. 5a)." More importantly, the authors show that "tACS-amplitude" was not reproducible across sessions.

      Unfortunately, we did not understand what the reviewer is suggesting here, and would have to ask the reviewer in this case to provide us with more information.

      References

      Kasten FH, Duecker K, Maack MC, Meiser A, Herrmann CS (2019) Integrating electric field modeling and neuroimaging to explain inter-individual variability of tACS effects. Nat Commun 10:5427. Riecke L, Sack AT, Schroeder CE (2015) Endogenous Delta/Theta Sound-Brain Phase Entrainment Accelerates the Buildup of Auditory Streaming. Curr Biol 25:3196-3201.

      Riecke L, Formisano E, Sorger B, Baskent D, Gaudrain E (2018) Neural Entrainment to Speech Modulates Speech Intelligibility. Curr Biol 28:161-169 e165.

      Saturnino GB, Madsen KH, Thielscher A (2021) Optimizing the electric field strength in multiple targets for multichannel transcranial electric stimulation. J Neural Eng 18.

      Saturnino GB, Siebner HR, Thielscher A, Madsen KH (2019) Accessibility of cortical regions to focal TES: Dependence on spatial position, safety, and practical constraints. Neuroimage 203:116183.

      Zoefel B, Davis MH, Valente G, Riecke L (2019) How to test for phasic modulation of neural and behavioural responses. Neuroimage 202:116175.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public review):

      Weaknesses: The interpretation is somewhat model-dependent, and it is unclear if the interpretation is unique. For example, it is unclear if the heterogeneous release probability among sites, silent sites, can explain the results. N estimates out of variance-mean analysis for example may be limited by the availability of postsynaptic receptors.

      To address this criticism, we have added a paragraph in the Discussion outlining the main assumptions underlying our work and how possible deviations from these assumptions may have affected our conclusions. This new paragraph is titled ' Assumptions behind our analysis, and possible limitations of our conclusions'.

      Reviewer 1, Recommendations to Authors:

      Without molecular evidence or anatomical evidence, the model and conclusions may remain as a postulate at this stage. This can be discussed carefully. Also, the study looks a bit narrow regarding the scope, only dealing with RS-DS model vs TS-LS model. Maybe, the authors pick up a bit more qualitative findings that directly support RS-DS model.

      To address these issues, another paragraph has been added to the Discussion titled 'Functional evidence in favor of the RS/DS model at PF-MLI synapses, and remaining uncertainties on the molecular composition and morphological arrangement of docking sites'.

      Minor: Fukaya et al. studied not cerebellar mossy fiber synapses.

      We apologize for this error, which has now been rectified.

      Reviewer 2 (Public review):

      It remains unclear how generalizable the findings are to other types of synapses.

      We agree with the Reviewer: this is a limitation of our study. In the Discussion we have a paragraph titled 'Maximum RRP size for other synaptic types' where we discuss this point. As we say in this paragraph, central synapses are clearly diverse, and the level of applicability of our results across preparations will depend on our ability to extend SV counting to various types of brain synapses. For the moment SV counting has been applied to only two types of synapses: PF-MLI synapses and hMF-IN synapses. We are encouraged by the fact that the simple synapse study by Tanaka et al. (2021), carried out at hMF-IN synapses, offers another example where the ratio between RRP size and N is larger than 1.

      Recommendations to Authors,

      Minor comments:

      The manuscript is at times difficult to read or reads like a review. The introduction could be shortened to concisely outline the motivation and premises for the study. The results and methods sections should not contain excessive interpretation and discussion. Although very informative, it distracts from the simple principal message.

      To address these criticisms, we have shortened the Introduction and parts of the Results section. These changes have resulted in a presentation of Results that is shorter and more focused on data and simulations than in the previous version. Nevertheless, readers need to be informed of ongoing research on docking sites and the principles of sequential models to understand the usefulness of our work. For this reason, we have maintained a theoretical section at the beginning of Results.

      The rationale for the choice of synapse and experimental conditions remains unclear until the discussion. This needs to be clearly addressed at the beginning, in the introduction, or in the results. In particular, the extracellular calcium concentration and the addition of 4-AP to the recording solution should be addressed in the results.

      The reason to choose the PF-MLI synapse is now indicated at the end of the Introduction. The rationale underlying our choice of experimental conditions including the extracellular calcium concentration and the addition of 4-AP is now briefly explained in the beginning the second section of Results (titled 'Maximizing RRP size and its release during AP trains'), and more extensively in the Methods section (as in the previous version of the manuscript).

      Potential confounds of the approach should be discussed (e.g. could a broadened AP in 4-AP alter synchronicity of release, i.e. desynchronization of release, especially during trains. That could be complemented with information on the EPSC kinetics (rise, decay) under different experimental conditions, as well as during train stimulation. How could presynaptic calcium concentration and time course in 4-AP impact the conclusions?

      To study the effects of 4-AP on AP broadening we have performed a new analysis of EPSC latencies in control and in 4-AP. In both cases the first latencies were independent of i. In 4-AP, first latencies displayed a small right shift of 0.2 ms (see additional figure below). This indicates that 4-AP does broaden the AP waveform, but that the extent of this broadening is limited. This new information has been added in the Methods of the revised manuscript.

      As suspected by the Reviewer, the latency distribution changes as a function of i and in the presence of 4-AP. Consistent with earlier findings (Miki et al., 2018), the proportion of 2-step release (with longer latencies) augments as a function of i both in control and in 4-AP. We also find that the value of the fast time constant of the latency distribution,τf, is larger in 4-AP than in control. This last result probably indicates a longer presynaptic calcium entry in 4-AP.

      In the revised version, we describe these results in the Methods section, in a new paragraph titled 'Changes in latency distributions as a function of i and of experimental conditions'.

      While the latency distributions change as a function of i and as a function of experimental conditions, this does not affect our conclusions, because these conclusions are based on the summed number of release events after each AP (or in other words, on the integral of the latency distributions).

      The kinetics of mEPSCs (risetime and decay time) are unchanged by 4-AP or by PTP. Consequently, in a given experiment, we used the same template to perform our deconvolution analysis for all conditions that were examined (starting with 3 mM Cao up to 200 Hz). This information has now been added in Methods.

      Following an AP stimulation, the amount of calcium entry in the presence of 4-AP is presumably much larger than in control. TEA, a weaker K channel blocker than 4-AP at PF-MLI synapses, elicits a marked increase in calcium entry (Malagon et al., 2020). This suggests an even larger increase with 4-AP, even though this has not been directly confirmed in the present work. The enhanced calcium entry translates in an increase in the parameters pr, r and s of our model. The important thing for our study is to increase pr and r as much as possible to promote the emptying of the RRP during trains. Knowing the exact amount of calcium entry and its relation to pr /r increase is not essential for this purpose. Likewise, whether r (and/or s) increase as a function of i is of little practical importance since much of the RRP is emptied already after the second stimulation, at least in the most extreme case (200 Hz stimulation).

      The applicability of this model to other synapses needs to be addressed more thoroughly. This synapse, under physiological conditions, has a very low Pr, and the experimental conditions have to be adjusted dramatically to achieve a high-Pr. How applicable are the conclusions to high-Pr synapses and/or synapses that operate in a multivesicular release regime? Although that might be difficult to test experimentally it should be addressed in the discussion.

      The applicability issue to other synapses has been addressed above, in response to the public comments of the same Reviewer.

      As the Reviewer points out, the PF-MLI synapse has a small P value under physiological conditions. One can speculate that synapses that exhibit a higher P value may have a higher docking site occupancy than PF-MLI synapses. This feature would increase their chance of having a ratio of RRP size over N larger than 1, as it occurs in PF-MLI synapses in high docking occupancy conditions. A sentence making this point has been added to the paragraph titled 'Maximum RRP size for other synaptic types' in the revised manuscript.

      Author response image 1.

      Latency histograms for s1 in control and in the presence of 4-AP. After normalization, the averaged latency histogram in 4-AP displays an additional delay of 0.2 ms, and a slowing of the time constant τf from 0.47 ms to 0.70 ms.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      “The exact levels of inhibition, excitation, and neuromodulatory inputs to neural networks are unknown. Therefore, the work is based on fine-tuned measures that are indirectly based on experimental results. However, obtaining such physiological information is challenging and currently impossible. From a computational perspective it is a challenge that in theory can be solved. Thus, although we have no ground-truth evidence, this framework can provide compelling evidence for all hypothesis testing research and potentially solve this physiological problem with the use of computers.”

      Response: We agree with the reviewer. This work was intended to determine the feasibility of reverse engineering motor unit firing patterns, using neuron models with a high degree realism. Given the results support this feasibility, our model and technique will therefore serve to construct new hypotheses as well as testing them.

      • Common input structure lines 115

      I agree with the following concepts, but I would specify that there is not only one dominant common input. It has been shown that there are multiple common inputs to the same motor nuclei (e.g., the two inputs are orthogonal and are shared with a subset of the active motoneurons) particularly for agonist motoneuron pools of synergistic muscles. On the hand muscles the authors are correct that there is only one dominant common input. Moreover, there is also some animal work suggesting that common inputs is just an epiphenomenon. This is completely in contradiction to what we observe in-vivo in the firing patterns of motor units, but perhaps worth mentioning and discussing.

      Response: Thanks for emphasizing this point. We have cited a recent reference discussing the important issue of common drive and the possibility of more than one source. Our simulations assume the net form of the excitatory input to all motoneurons in the pool is the same, except for noise. This net form (which produces the linear CST output in each case) essentially represents the sum of all inputs, both descending and sensory. Our results show the same over pattern as human data, i.e. that all motor unit firing patterns have similar trajectories (again allowing for the impact of noise). Future studies will consider separating excitatory inputs into different sources.

      It is interesting that the authors mention suprathreshold rate modulation. Could the authors just discuss more on how the model would respond to a simulated suprathreshold current for all simulated motoneurons (i.e., like the ones generated during a suprathreshold-injected current or voluntary maximal feedforward movement?)

      Response: Thank you for this point. Our use of the term “suprathreshold” was not applied correctly. We meant “suprathreshold” to refer to amount of input above the recruitment threshold. We have decided to remove this term so now the sentence “…so less is available for rate modulation…”.

      194 a full point is missing.

      Response: We addressed the error.

      204-231 and 232-259, these two paragraphs have been copied twice.

      Response: We addressed the error.

      Line 475 typo

      Response: We addressed the error.

      591 It would be interesting to add the me it takes a standard computer with known specs and a super computer to run over one batch of simulation (i.e., how long one of the 6,300,000 simulation takes).

      Response: Each simulation took about 20 minutes of real me. Assuming a standard computer with 16 processor cores using a similar microarchitecture as Bebop (Intel Broadwell architecture), the standard computer could run 16 simulations at a me (one simulation assigned per core). This would take the standard computer about 15 years to complete all 6.3M simulations.

      594 I don't understand why there are 6M simulations, could the authors provide more info on the combinations and why there are 6M simulations.

      Response: The 6M simulations are the total number of simulations that were performed for this work. A detailed explanation can be found in section: “Machine learning inference of motor pool characteristics” at line 591. Briefly, there were 315,000 simulations of a pool of 20 motoneurons (20 x 315,000 = 6.3 million). The 315,000 simulations was required to run all possible combinations of 15 patens of inhibition, 5 of neuromodulation, 7 of distribution of excitatory inputs and 30 different repeats of synaptic noise with different seeds. In addition, there were 20 iterations for each of these combinations to generate a linear CST output (as illustrated in Fig. 3). 15 x5 x 7 x 30 x 20=315,000.

      In several simulations it seems that there was a lot of fine-tuning of inputs to match the measured motor unit firing pattern. Have the authors ever considered a fully black-box AI approach? If they think is interesting maybe it could spice up the discussion.

      Response: We agree that AI has potential for reverse engineering the whole system and we are looking into adding it to future version of this algorithm as an alternative. We started with a simple but powerful grid search to enhance our understanding of the interaction between inputs, neuron properties and outputs.

      Reviewer 2

      Comment 1:

      “First, I believe that the relation between individual motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties can be illustrated more clearly. Although this is explained in the text, I believe that this is not optimally supported by figures. Figure 6 to some extent shows this, but figures 8 and 9 as well as Table 1 shows primarily the goodness of fit rather than the actual fit.”

      Response: We agree with the reviewer that showing the relationship between the motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties would be a great addition to the manuscript. Because the regression models have multiple dimensions (7 inputs and 3 outputs) it is difficult to show the relationship in a static image. We thought it best to show the goodness of fit even though it is more abstract and less intuitive. We added a supplemental diagram to Figure 8 to show the structure of the reverse engineered model that was fit (see Figure 8D).

      Author response image 1.

      Figure 8. Residual plots showing the goodness of fit of the different predicted values: (A) Inhibition, (B) Neuromodulation and (C) excitatory Weight Rao. The summary plots are for the models showing highest 𝑅𝑅2 results in Table 1. The predicted values are calculated using the features extracted from the firing rates (see Figure 7, section Machine learning inference of motor pool characteristics and Regression using motoneuron outputs to predict input organization). Diagram (D) shows the multidimensionality of the RE models (see Model fits) which have 7 feature inputs (see Feature Extraction) predicting 3 outputs (Inhibition, Neuromodulation and Weight Rao).

      Comment 2:

      “Second, I would have expected the discussion to have addressed specifically the question of which of the two primary schemes (pushpull, balanced) is the most prevalent. This is the main research question of the study, but it is to some degree le unanswered. Now that the authors have identified the relation between the characteristics of motor neuron behaviors (which has been reported in many previous studies), why not exploit this finding by summarizing the results of previous studies (at least a few representative ones) and discuss the most likely underlying input scheme? Is there a consistent trend towards one of the schemes, or are both strategies commonly used?”

      Response: We agree with the reviewer that our discussion should have addressed which of the two primary schemes – push-pull or balanced – is the most prevalent. At first glance, the upper right of Figure 6 looks the most realistic when compared to real data. We thus would expect that the push-pull scheme to dominate for the given task.

      We added a brief section (Push-Pull vs Balance Motor Command) in the discussion to address the reviewer’s comments. This section is not exhaustive but frames the debate using relevant literature. We are also now preparing to deploy these techniques on real data.

      Comment 3:

      In addition, it seems striking to me that highly non-linear excitation profiles are necessary to obtain a linear CST ramp in many model configurations. Although somewhat speculative, one may expect that an approximately linear relation is desired for robust and intuitive motor control. It seems to me that humans generally have a good ability to accurately grade the magnitude of the motor output, which implies that either a non-linear relation has been learnt (complex task), or that the central nervous system can generally rely on a somewhat linear relation between the neural drive to the muscle and the output (simpler task).

      Response: We agree with the reviewer, and we were surprised by these results. Our motoneuron pool is equipped with persistent inward currents (PICs) which are nonlinear. Therefore, for the motoneuron to produce a linear output the central nervous system would have to incorporate these nonlinearities into its commands.

      Following this reasoning, it could be interesting to report also for which input scheme, the excitation profile is most linear. I understand that this is not the primary aim of the study, but it may be an interesting way to elaborate on the finding that in many cases non-linear excitation profiles were needed to produce the linear ramp.

      This is a very interesting point. The most realistic firing patterns – with respect to human data – are found in the parameter regions in the upper right in Figure 6, which in fact produce the most nonlinear input (see push-pull pattern in Figure 4C). However, in future studies we hope to separate the total motor command illustrated here into descending and feedback commands. This may result in a more linear descending drive.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper investigates host and viral factors influencing transmission of alpha and delta SARS-CoV-2 variants in the Syrian hamster model and fundamentally increases knowledge regarding transmission of the virus via the aerosol route. The strength of evidence is solid and could be improved with a clearer presentation of the data.

      We thank the editors for their assessment. We are excited to present a revised version of the manuscript with improved data presentation and an improved discussion addressing the reviewer’s concerns.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the submitted manuscript, Port et al. investigated the host and viral factors influencing the airborne transmission of SARS-CoV-2 Alpha and Delta variants of concern (VOC) using a Syrian hamster model. The authors analyzed the viral load profiles of the animal respiratory tracts and air samples from cages by quantifying gRNA, sgRNA, and infectious virus titers. They also assessed the breathing patterns, exhaled aerosol aerodynamic profile, and size distribution of airborne particles after SARS-CoV-2 Alpha and Delta infections. The data showed that male sex was associated with increased viral replication and virus shedding in the air. The relationship between co-infection with VOCs and the exposure pattern/timeframe was also tested. This study appears to be an expansion of a previous report (Port et al., 2022, Nature Microbiology). The experimental designs were rigorous, and the data were solid. These results will contribute to the understanding of the roles of host and virus factors in the airborne transmission of SARS-CoV-2 VOCs.

      Reviewer #2 (Public Review):

      This manuscript by Port and colleagues describes rigorous experiments that provide a wealth of virologic, respiratory physiology, and particle aerodynamic data pertaining to aerosol transmission of SARS-CoV-2 between infected Syrian hamsters. The data is particularly significant because infection is compared between alpha and delta variants, and because viral load is assessed via numerous assays (gRNA, sgRNA, TCID) and in tissues as well as the ambient environment of the cage. The paper will be of interest to a broad range of scientists including infectious diseases physicians, virologists, immunologists and potentially epidemiologists. The strength of evidence is relatively high but limited by unclear presentation in certain parts of the paper.

      Important conclusions are that infectious virus is only detectable in air samples during a narrow window of time relative to tissue samples, that airway constriction increases dynamically over time during infection limiting production of fine aerosol droplets, that variants do not appear to exclude one another during simultaneous exposures and that exposures to virus via the aerosol route lead to lower viral loads relative to direct inoculation suggesting an exposure dose response relationship.

      While the paper is valuable, I found certain elements of the data presentation to be unclear and overly complex.

      Reviewer #1 (Recommendations For The Authors):

      We thank the reviewer for their comments and their attention to detail. We have taken the following steps to address their suggestions and concerns.

      However, the following concerns need to be issued.

      1. Summary seems to be too simple, and some results are not clearly described in the summary.

      We have edited the summary and hope to have addressed the concerns raised by providing more information. We think that the summary includes all relevant findings.

      “It remains poorly understood how SARS-CoV-2 infection influences the physiological host factors important for aerosol transmission. We assessed breathing pattern, exhaled droplets, and infectious virus after infection with Alpha and Delta variants of concern (VOC) in the Syrian hamster. Both VOCs displayed a confined window of detectable airborne virus (24-48 h), shorter than compared to oropharyngeal swabs. The loss of airborne shedding was linked to airway constriction resulting in a decrease of fine aerosols (1-10µm) produced, which are suspected to be the major driver of airborne transmission. Male sex was associated with increased viral replication and virus shedding in the air. Next, we compared the transmission efficiency of both variants and found no significant differences. Transmission efficiency varied mostly among donors, 0-100% (including a superspreading event), and aerosol transmission over multiple chain links was representative of natural heterogeneity of exposure dose and downstream viral kinetics. Co-infection with VOCs only occurred when both viruses were shed by the same donor during an increased exposure timeframe (24-48 h). This highlights that assessment of host and virus factors resulting in a differential exhaled particle profile is critical for understanding airborne transmission.”

      1. Aerosol transmission experiment should be described in Materials and Methods although it is cited as Reference 21#;

      We have modified Line 433:

      “Aerosol caging

      Aerosol cages as described by Port et al. [2] were used for transmission experiments and air sampling as indicated. The aerosol transmission system consisted of plastic hamster boxes (Lab Products) connected by a plastic tube. The boxes were modified to accept a 7.62 cm (3') plastic sanitary fitting (McMaster-Carr), which enabled the length between the boxes to be changed. Airflow was generated with a vacuum pump (Vacuubrand) attached to the box housing the naïve animals and was controlled with a float-type meter/valve (McMaster-Carr).”

      And Line 458: “During the first 5 days, hamsters were housed in modified aerosol cages (only one hamster box) hooked up to an air pump.”.

      Especially, one superspreading event of Alpha VOC (donor animal) was observed in iteration A (Figure 4). What causes that event, experiment system?

      Based on the observed variation in airborne shedding (of the cages from which this was directly measured), we believe that one plausible explanation for the super-spreading event was that the Alpha-infected donor shed considerably more virus during the exposure than other donors, and thus more readily infected the sentinels. That said, it is also conceivable that other factors such as hamster behavior (e.g., closeness to the cage outlet, sleeping) or variable sentinel susceptibility could affect the distribution of transmissions.

      1. Same reference is repeatedly listed as Refs 2 and 21#.

      Addressed. We thank the reviewer for their attention to detail. We have also removed reference 53, which was the same as 54.

      1. Two forms of described time (hour and h) are used in the manuscript. Single form should be chosen.

      This has been addressed.

      5) Virus designation located in line 371 and line 583 is inconsistent, and it needs to be revised.

      For consistency we have chosen this nomenclature for the viruses used: SARS-CoV-2 variant Alpha (B.1.1.7) (hCoV320 19/England/204820464/2020, EPI_ISL_683466) and variant Delta (B.1.617.2/) (hCoV-19/USA/KY-CDC-2-4242084/2021, EPI_ISL_1823618).

      1. In Figure 5F, what time were lung and nasal turbinate tissues collected after virus infection?

      This has been added to the legend. Day 5. Line 904.

      1. Line 562-563, what is the coating antigen (spike protein, generated in-house)? purified or recombinant protein?

      It is in-house purified recombinant protein. This has been added to the methods.

      1. Line 575 and line 578: 10,000x is not standard description, and it should be revised.

      Done.

      Reviewer #2 (Recommendations For The Authors):

      We thank the reviewer for their comments and suggestions to improve the manuscript, and hope we have addressed all concerns adequately.

      • Direct interpretation of the linear regression slope in Figure 3 is challenging. Is the most relevant parameter for transmission known? Intuitively, it would be the absolute number of small droplets at a given timepoint rather than the slope and it would be easier to interpret if the data were reported in this fashion.

      We decided to show a percentage of counts to normalize the data among animals, as we observed large inter-individual variation in counts. The reviewer is correct that it is most likely the number of particles that would be most relevant to transmission, though much (including the role of particle size) remains to be determined. We have added a sentence to the results which explains this in L157.

      Therefore, we decided in this first analysis to utilize the slope measurement and not raw counts. The focus was on the slopes and how particle profiles were changing post inoculation. Because we have focused on percentages, it seems not appropriate to present particle counts within each diameter range because the analysis, model, and results are based on these percentages of particles.

      Use of regression to compute slope is a useful measure because it uses data from all timepoints to estimate the regression line and, therefore, the % of particles on each day. We decided on these methods because efficiency is especially important in a study with a relatively small number of animals and slopes are also a good surrogate for how animal particle profiles are changing post-inoculation.

      To assist with the interpretation: 1) We removed Figure 3C and D and replaced Figure 3B with individual line plots for all conditions to visualize the slopes. The figure legend was corrected to reflect these changes.

      2) We replaced L169 onwards to read: (Figure 3B). Females had a steeper decline at an average rate of 2.2 per day after inoculation in the percent of 1-10 μm particles (and a steeper incline for <0.53 μm) when compared to males, while holding variant group constant. When we compared variant group while holding sex constant, we found that the Delta group had a steeper decline at an average rate of 5.6 per day in the percent of 1-10 μm particles (and a steeper incline for <0.53 μm); a similar trend, but not as steep, was observed for the Alpha group.

      The estimated difference in slopes for Delta vs. controls and Alpha vs. controls in the percent of <0.53 μm particles was 5.4 (two-sided adjusted p= 0.0001) and 2.4 (two-sided adjusted p = 0.0874), respectively. The estimated difference in slopes for percent of 1-10 μm particles was not as pronounced, but similar trends were observed for Delta and Alpha. Additionally, a linear mixed model was considered and produced virtually the same results as the simpler analysis described above; the corresponding linear mixed model estimates were the same and standard errors were similar.

      • Fig 4: what is "limit of quality" mentioned in the legend? Are these samples undetectable?

      We have clarified this in the legend: “3.3 = limit of detection for RNA (<10 copies/rxn)”. If samples have below 10 copy numbers per reaction, they are determined to be below the limit of detection. The limit of detection is 10 copy number/rxn. All samples below 10 copies/rxn are taken to be negative and set = 10 copies/rxn, which equals 3.3. Log10 copies/mL oral swab.

      • Fig 4C would be easier to process in graphical rather than tabular form. The meaning of the colors is unclear.

      We agree with the reviewer that this is difficult to interpret, but we are uncertain if the same data in a tabular format would be easier to digest. We realized that the legend was misplaced and have added this back into the figure, which we hope clarifies the colors and the limit of detection.

      • Figure 4D & E are uninterpretable. What do the pie charts represent?

      We have remodeled this part of the figure to a schematic representation of the majority variant which transmitted for each individual sentinel, and have added a table (Table S1) which summarizes the exact sequencing results for the oral swabs. The reviewer is correct that it was difficult to interpret the pie charts, considering most values are either 0 or close to 100%. We hope this addresses the question. The legend states:

      Author response image 1.

      Airborne attack rate of Alpha and Delta SARS-CoV-2 variants. Donor animals (N = 7) were inoculated with either the Alpha or Delta variant with 103 TCID50 via the intranasal route and paired together randomly (1:1 ratio) in 7 attack rate scenarios (A-G). To each pair of donors, one day after inoculation, 4-5 sentinels were exposed for a duration of 4 h (i.e., h 24-28 post inoculation) in an aerosol transmission set-up at 200 cm distance. A. Schematic figure of the transmission set-up. B. Day 1 sgRNA detected in oral swabs taken from each donor after exposure ended. Individuals are depicted. Wilcoxon test, N = 7. Grey = Alpha, teal = Delta inoculated donors. C. Respiratory shedding measured by viral load in oropharyngeal swabs; measured by sgRNA on day 2, 3, and 5 for each sentinel. Animals are grouped by scenario. Colors refer to legend below. 3.3 = limit of detection of RNA (<10 copies/rxn). D. Schematic representation of majority variant for each sentinel as assessed by percentage of Alpha and Delta detected in oropharyngeal swabs taken at day 2 and day 5 post exposure by deep sequencing. Grey = Alpha, teal = Delta, white = no transmission.

      • Fig S2G is uninterpretable. Please label and explain.

      We have now included an explanations of the figure S2F. The figure is a graphic representation of the neutralization data depicted in Figure S2F. The spacing between grid lines is 1 unit of antigenic distance, corresponding to a twofold dilution of serum in the neutralization assay. The resulting antigenic distance depicted between Alpha and Delta is roughly a 4-fold difference in neutralization between homologous (e.g., Alpha sera with the Alpha virus vs. heterologous, Alpha sera with the Delta virus).

      • I would consider emphasizing lines 220-225 in the summary and abstract. The important implication is that aerosol transmission is more representative of natural heterogeneity of exposure dose and downstream viral kinetics. This is an often-overlooked point.

      We agree with the reviewer and have added this in Line 43.

      • Fig 5: A cartoon similar to Fig 4A showing timing of sentinel exposure with number of animals would be helpful.

      We have added this as a new panel A for Figure 5. See the redrafted Figure 5 below.

      • For Fig 5E & F It would be helpful to use a statistical test to more formally assess whether proportion at exposure predicts proportion of variants in downstream sentinel infection.

      This has been added as a new Figure 5 panel H and I, which we hope addresses the reviewer’s comment.

      Author response image 2.

      Airborne competitiveness of Alpha and Delta SARS-CoV-2 variants. A. Schematic. Donor animals (N = 8) were inoculated with Alpha and Delta variant with 5 x 102 TCID50, respectively, via the intranasal route (1:1 ratio), and three groups of sentinels (Sentinels 1, 2, and 3) were exposed subsequently at a 16.5 cm distance. Animals were exposed at a 1:1 ratio; exposure occurred on day 1 (Donors  Sentinels 1) and day 2 (Sentinels  Sentinels). B. Respiratory shedding measured by viral load in oropharyngeal swabs; measured by gRNA, sgRNA, and infectious titers on days 2 and day 5 post exposure. Bar-chart depicting median, 96% CI and individuals, N = 8, ordinary two-way ANOVA followed by Šídák's multiple comparisons test. C/D/E. Corresponding gRNA, sgRNA, and infectious virus in lungs and nasal turbinates sampled five days post exposure. Bar-chart depicting median, 96% CI and individuals, N = 8, ordinary two-way ANOVA, followed by Šídák's multiple comparisons test. Dark orange = Donors, light orange = Sentinels 1, grey = Sentinels 2, dark grey = Sentinels 3, p-values indicated where significant. Dotted line = limit of quality. F. Percentage of Alpha and Delta detected in oropharyngeal swabs taken at days 2 and day 5 post exposure for each individual donor and sentinel, determined by deep sequencing. Pie-charts depict individual animals. Grey = Alpha, teal = Delta. G. Lung and nasal turbinate samples collected on day 5 post inoculation/exposure. H. Summary of data of variant composition, violin plots depicting median and quantiles for each chain link (left) and for each set of samples collected (right). Shading indicates majority of variant (grey = Alpha, teal = Delta). I. Correlation plot depicting Spearman r for each chain link (right, day 2 swab) and for each set of samples collected across all animals (left). Colors refer to legend on right. Abbreviations: TCID, Tissue Culture Infectious Dose.”

      We have additionally added to the results section: L284: “Combined a trend, while not significant, was observed for increased replication of Delta after the first transmission event, but not after the second, and in the oropharyngeal cavity (swabs) as opposed to lungs (Figure 5H) (Donors compared to Sentinels 1: p = 0.0559; Donors compared to Sentinels 2: p = >0.9999; Kruskal Wallis test, followed by Dunn’s test). Swabs taken at 2 DPI/DPE did significantly predict variant patterns in swabs on 5 DPI/DPE (Spearman’s r = 0.623, p = 0.00436) and virus competition in the lower respiratory tract (Spearman’s r = 0.60, p = 0.00848). Oral swab samples taken on day 5 strongly correlate with both upper (Spearman’s r = 0.816, p = 0.00001) and lower respiratory tract tissue samples (Spearman’s r = 0.832, p = 0.00002) taken on the same day (Figure 5I).”

      • Fig 1A: how are pfu/hour inferred? This is somewhat explained in the supplement, but I found the inclusion of model output as the first panel confusing and am still not 100% clear how this was done. Consider, explaining this in the body of the paper.

      We have added a more detailed explanation of the PFU/h inference to the main text: The motivation for the model was to link more readily measurable quantities such as RNA measured in oral swabs to the quantity of greatest interest for transmission (infectious virus per unit time in the air). To do this, we jointly infer the kinetics of shed airborne virus and parameters relating observable quantities (infected sentinels, plaques from purified air sample filters) to the actual longitudinal shedding. The inferential model uses mechanistic descriptions of deposition of infectious virus into the air, uptake from the air, and loss of infectious virus in the environment to extract estimates of the key kinetic parameters, as well as the resultant airborne shedding, for each animal.

      We have added this information to L106 in the results and hope this clarifies the rationale and execution of the model.

      More minor points:

      • Line 292: "poor proxy" seems too strong as peak levels of viral RNA correlate with positive airway cultures. It might be more accurate to say that high levels of viral RNA during early infection only somewhat correlate with positive airway cultures.

      We have rephrased this to clarify that while peak RNA viral loads are predictive of positive cultures, measuring RNA, especially early during infection and only once, may not be sufficient to infer the magnitude or time-dependence of infectious virus shedding into the air. See Line 308: “We found that swab viral load measurements are a valuable but imperfect proxy for the magnitude and timing of airborne shedding. Crucially, there is a period early in infection (around 24 h post-infection in inoculated hamsters) when oral swabs show high infectious virus titers, but air samples show low or undetectable levels of virus. Viral shedding should not be treated as a single quantity that rises and falls synchronously throughout the host; spatial models of infection may be required to identify the best correlates of airborne infectiousness [32]. Attempts to quantify an individual’s airborne infectiousness from swab measurements should thus be interpreted with caution, and these spatiotemporal factors should be considered carefully.”

      • Line 352: Re is dependent on time of an outbreak (population immunity) and cannot be specified for a given variant as it depends on multiple other variables

      We agree that the current phrasing here could be interpreted to suggest, incorrectly, that Re is an intrinsic property of a variant. We have deleted that language and reworded the section to emphasize that the critical question is heterogeneity in transmission, not mean reproduction number. Line 348: “Moreover, at the time of emergence of Delta, a large part of the human population was either previously exposed to and/or vaccinated against SARS-CoV-2; that underlying host immune landscape also affects the relative fitness of variants. Our naïve animal model does not capture the high prevalence of pre-existing immunity present in the human population and may therefore be less relevant for studying overall variant fitness in the current epidemiological context. Analyses of the cross-neutralization between Alpha and Delta suggest subtly different antigenic profiles [35], and Delta’s faster kinetics in humans may have also helped it cause more reinfections and “breakthrough” infections [36].

      Our two transmission experiments yielded different outcomes. When sentinel hamsters were sequentially exposed, first to Alpha and then to Delta, generally no dual infections—both variants detectable—were observed. In contrast, when we exposed hamsters simultaneously to one donor infected with Alpha and another infected with Delta, we were able to detect mixed-variant virus populations in sentinels in one of the cages (Cage F, see Appendix figures S1, S2). The fact that we saw both single-lineage and multi-lineage transmission events suggests that virus population bottlenecks at the point of transmission do indeed depend on exposure mode and duration, as well as donor host shedding. Notably, our analysis suggests that the Alpha-Delta co-infections observed in the Cage F sentinels could be due to that being the one cage in which both the Alpha and the Delta donor shed substantially over the course of the exposure (Appendix figures S2, S3). Mixed variant infections were not retained equally, and the relative variant frequencies differed between investigated compartments of the respiratory tract, suggesting roles for randomness or host-and-tissue specific differences in virus fitness.

      A combination of host, environmental and virus parameters, many of which vary through time, play a role in virus transmission. These include virus phenotype, shedding in air, individual variability and sex differences, changes in breathing patterns, and droplet size distributions. Alongside recognized social and environmental factors, these host and viral parameters might help explain why the epidemiology of SARS-CoV-2 exhibits classic features of over-dispersed transmission [37]. Namely, SARS-CoV-2 circulates continuously in the human population, but many transmission chains are self-limiting, while rarer superspreading events account for a substantial fraction of the virus’s total transmission. Heterogeneity in the respiratory viral loads is high and some infected humans release tens to thousands of SARS-CoV-2 virions/min [38, 39]. Our findings recapitulate this in an animal model and provide further insights into mechanisms underlying successful transmission events. Quantitative assessment of virus and host parameters responsible for the size, duration and infectivity of exhaled aerosols may be critical to advance our understanding of factors governing the efficiency and heterogeneity of transmission for SARS-CoV-2, and potentially other respiratory viruses. In turn, these insights may lay the foundation for interventions targeting individuals and settings with high risk of superspreading, to achieve efficient control of virus transmission [40].”

      • The limitation section should mention that this animal model does not capture the large prevalence of pre-existing immunity at present in the population and may therefore be less relevant in the current epidemiologic context.

      We agree and have added this more clearly, see response above.

      • Limitation: it is unclear if airway and droplet dynamics in the hamster model are representative of humans.

      We have added the following sentence: Line 331: “It remains to be determined how well airway and particle size distribution dynamics in Syrian hamsters model those in humans.”

      • The mathematical model is termed semi-mechanistic but I think this is not accurate as the model appears to have no mechanistic assumptions.

      We describe the model as semi-mechanistic because it uses mechanistic descriptions of the shedding and uptake process (as described above), incorporating factors including respiration rate and environmental loss, and makes the mechanistic assumption that measurable swab and airborne shedding all stem from a shared within-host infection process that produces exponential growth of virus up to a peak, followed by exponential decay. The model is only semi-mechanistic, however, as we do not attempt a full model of within-host viral replication and shedding (e.g. a target-cell limited virus kinetics model).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Comment 1: It is worth mentioning that the authors show that there are Arid1a transcripts that escape the Cre system. This might mask the phenotype of the Arid1a knockout, given that many sequencing techniques used here are done on a heterogeneous population of knockout and wild-type spermatocytes.

      Response: The proportions of undifferentiated spermatogonia (PLZF+) with detectable (ARID1A+) and non-detectable (ARID1A=) levels of ARID1A protein by immunostaining on testes cryosections obtained from 1-month old Arid1afl/fl (control) and Arid1acKO (CKO) males were 74% ARID1A negative (CKO) and 26% ARID1A positive (CKO) as compared to 95% ARID1A positive and 5% ARID1A negative in WT controls. The manuscript includes these data (page 5, lines 114-116). Furthermore, Western blot analysis of STA-Put purified pachytene WT and mutant spermatocytes showed significantly reduced levels of ARID1A protein in mutant cells (95% reduction). The manuscript has added these data (page 5, line 116 and Fig. S2).

      Comment 2: In relation to this, I think that the use of the term "pachytene arrest" might be overstated, since this is not the phenotype truly observed (these mice produce sperm).

      Response: Based on the profiling of prophase-I spermatocytes by co-staining for SYCP3 and ARID1A, we observed a marked reduction in mid-late pachytene spermatocytes that lacked ARID1A, indicating a failure to progress beyond pachynema in the absence of ARID1A (Table 1 in manuscript). Furthermore, we were unable to detect diplotene spermatocytes lacking ARID1A protein. Haploid spermatid populations isolated from Arid1acKO males appeared normal, expressing the wild-type allele, suggesting that they originated from spermatocytes that failed to undergo efficient Cre recombination (Fig. S3). Arid1acKO also produces viable sperm at a level equal to their wild-type controls (see page 5, lines 123-126). It is reasonable to conclude that the absence of ARID1A results in a pachynema arrest and that the viable sperm are from escapers. We cannot make any conclusions regarding the requirement of ARID1A for progression beyond pachynema.

      Comment 3: ARID1A is present throughout prophase I, and it might have pre-MSCI roles that impact earlier stages of Meiosis I, and cell death might be happening in these earlier stages too.

      Response: We did not observe an effect on the frequency of leptotene and zygotene spermatocytes lacking ARID1A. There appeared to be an accumulation of these prophase-I populations in response to the loss of ARID1A, consistent with a failure in progression beyond pachynema in the mutants (Table 1 in the manuscript).

      Additionally, we did not detect any significant difference in the numbers of undifferentiated spermatogonia expressing PLZF (also known as ZBTB16) in 1-month-old Arid1acKO relative to Arid1afl/fl males (see Table below, now included in the manuscript as supplemental Table 1). Therefore, the Arid1a conditional knockouts generated with a Stra8-Cre did not appear to impact earlier stages of spermatogenesis. However, potential roles of ARID1A early in spermatogenesis might be revealed using a more efficient and earlier-acting germline Cre transgene. In this case, an inducible Cre transgene would be needed, given the haploinsufficiency associated with Arid1a. Such haploinsufficiency was why we used the Stra8-Cre. The lack of Cre expression in the female germline allowed the transmission of the floxed allele maternally.

      Author response table 1.

      Comment 4: Overall, the research presented here is solid, adds new knowledge on how sex chromatin is silenced during meiosis, and has generated relevant databases for the field.

      Response: We thank the reviewer for this comment.

      Reviewer 2

      Comment 1: The conditional deletion mouse model of ARIDA using Stra8-cre showed inefficient deletion; spermatogenesis did not appear to be severely compromised in the mutants. Using this data, the authors claimed that meiotic arrest occurs in the mutants. This is obviously a misinterpretation.

      Response: As stated in response to Reviewer 1, testes cryosections obtained from 1-month-old control and mutant males showed that 74% are ARID1A negative (CKO) and 26% ARID1A positive (CKO) as compared to 95% ARID1A positive and 5% ARID1A negative in WT controls (page 5, lines 114-116). This difference is dramatic. Western blot analysis of STA-Put purified pachytene WT and mutant spermatocytes also showed a significant reduction of ARID1A protein in mutant cells (Fig. S2). We observed a marked decrease in mid-late pachytene spermatocytes that lacked ARID1A, indicating a failure to progress beyond pachynema without ARID1A (Table 1 from the manuscript). Furthermore, we were unable to detect any diplotene spermatocytes lacking ARID1A protein. These data suggest that the haploid spermatids originated from spermatocytes that failed to undergo efficient Cre recombination (Fig. S3). Comparison of cKO and wild-type littermate yielded nearly identical results (Avg total conc WT = 32.65 M/m; Avg total conc cKO = 32.06 M/ml), indicating that the cKO’s produce viable sperm at a level equal to their wild-type controls. Taken together, the conclusion that the absence of ARID1A results in a pachynema arrest and that the escapers produce the haploid spermatids is firm. By IF, we see that ~70% of the spermatocytes have deleted ARID1A. Therefore, we disagree with the reviewer’s comments that “spermatogenesis did not appear to be severely compromised in the mutants”.

      Comment 2: In the later parts, the authors performed next-gen analyses, including ATAC-seq and H3.3 CUT&RUN, using the isolated cells from the mutant mice. However, with this inefficient deletion, most cells isolated from the mutant mice appeared not to undergo Cre-mediated recombination. Therefore, these experiments do not tell any conclusion pertinent to the Arid1a mutation.

      Response: We agree that the ATAC-seq and CUT&RUN data were derived from a mixed population of pachytene spermatocytes consisting of mutants and, to a much lesser extent, escapers. As stated, based on our previous study (Menon et al., 2021, Nat. Commun., PMID: 34772938) and additional analyses in this current work, the undifferentiated spermatogonia lacking ARID1A indicates that Stra8-Cre is ~ 70% efficient. With this efficiency, we can detect striking changes in H3.3 occupancy and chromatin accessibility in the mutants relative to wild-type spermatocytes.

      Comment 3: Furthermore, many of the later parts of this study focus on the analysis of H3.3 CUT&RUN. However, Fig. S7 clearly suggests that the H3.3 CUT&RUN experiment in the wild-type simply failed. Thus, none of the analyses using the H3.3 CUT&RUN data can be interpreted.

      Response: We would like to draw the attention of the reviewer to a recent study (Fointane et al., 2022, NAR, PMID: 35766398) where the authors observed an identical X chromosome-wide spreading of H3.3 in mouse meiotic cells by ChIP-seq. The genomic distribution matches the microscopic observation of H3.3 coating of the sex chromosomes. Therefore, in normal spermatocytes, H3.3 distribution is pervasive across the X chromosome, with very few peaks observed in intergenic regions. Additionally, we detected H3.3 enrichment at TSSs of ARID1A-regulated autosomal genes in wild-type pachytene spermatocytes, albeit reduced relative to the mutants, indicating that the H3.3 CUT&RUN worked. For these reasons, we do not agree with the reviewer’s assessment that the H3.3 CUT&RUN experiment failed in the wild type.

      Comment 4: If the author wishes to study the function of ARID2 in spermatogenesis, they may need to try other cre-lines to have more robust phenotypes, and all analyses must be redone using a mouse model with efficient deletion of ARID2.

      Response: As noted, we chose Stra8-Cre to conditionally knockout Arid1a because ARID1A is haploinsufficient during embryonic development. The lack of Cre expression in the maternal germline allows for transmission of the floxed allele, allowing for the experiments to progress.

      Reviewer 3

      Comment 1: A challenge with the author's CKO model is the incomplete efficiency of ARID1A loss, due to incomplete CRE-mediated deletion. The authors effectively work around this issue, but they don't state specifically what percentage of CKO cells lack ARID1A staining. This information should be added.

      Response: Our data indicate that Stra8-Cre is ~ 70% efficient. This information has been added.

      Comment 2: They refer to cells that retain ARID1A staining in CKO testes as 'internal controls' but this reviewer finds that label inappropriate.

      Response: We have dropped ‘internal controls’ and used ‘escapers’ instead.

      Comment 3: Although some cells that retain ARID1A won't have undergone CRE-mediated excision, others may have excised but possibly have delayed kinetics of deletion or ARID1A RNA/protein turnover and loss. Such cells likely have partial ARID1A depletion to different extents and, therefore, in some cases, are no longer wild-type. In subsequent figures in which co-staining for ARID1A is done, it would be appropriate for the authors to specify if they are quantifying all cells from CKO testes, or only those that lack ARID1A staining.

      Response: We were unable to detect any diplotene spermatocytes lacking ARID1A protein. The data suggest that the haploid spermatids originated from spermatocytes that failed to undergo efficient Cre recombination (Fig. S3). Thus, we conclude that the absence of ARID1A results in a pachynema arrest and that the escapers produce haploid spermatids. In figures displaying quantification data, we indicate whether the quantification was performed on spermatocytes lacking or containing ARID1A from cKO testes. By IF, we see that ~70% of the spermatocytes have deleted ARID1A.

      Comment 4: The authors don't see defects in a few DDR markers in ARID1A CKO cells and conclude that the role of ARID1A in silencing is 'mutually exclusive to DDR pathways' (p 12) and 'occurs independently of DDR signaling' (p30). The data suggest that ARID1A may not be required for DDR signaling, but do not rule out the possibility that ARID1A is downstream of DDR signaling (and the authors even hypothesize this on p30). The data provided do not justify the conclusion that ARID1A acts independently of DDR signaling.

      associated DDR factors such as: H2Ax; ATR; and MDC1. We observed an abnormal persistence of elongating RNA polymerase II on the mutant XY body in response to the loss of ARID1A, emphasizing its role in the transcriptional repression of the XY during pachynema. The loss of ARID1A results in a failure to silence sex-linked genes and does so in the presence of DDR signaling factors in the XY body. As the reviewer notes, we highlighted the possibility that DDR pathways might influence ARID1A recruitment to the XY, evidenced by the hyperaccumulation of ARID1A on the sex body late in diplonema. Therefore, whether ARID1A is dependent on DDR signaling remains an open question.

      Comment 5: After observing no changes in levels or localization of H3.3 chaperones, the authors conclude that 'ARID1A impacts H3.3 accumulation on the sex chromosomes without affecting its expression or incorporation during pachynema.' It's not clear to this reviewer what the authors mean by this. Aside from the issue of not having tested DAXX or HIRA activity, are they suggesting that some other process besides altered incorporation leads to H3.3 accumulation, and if so, what process would that be?

      Response: The loss of ARID1A might result in an abnormal redistribution of DAXX or HIRA on the XY, potentially contributing to the defects in H3.3 accumulation and canonical H3.1/3.2 eviction on the XY. While speculative at this point, it is also possible that the persistence of elongating RNAPII in response to the loss of ARID1A might prevent the sex chromosome-wide coating of H3.3. Addressing the mechanism underlying ARID1A-governed H3.3 accumulation on the XY body remains a topic for future investigation.

      Comment 6: The authors find an interesting connection between certain regions that gained chromatin accessibility after ARID1A loss (clusters G1 and G3) and the presence of the PRDM9 sequence motif. The G1 and G3 clusters also show DMC1 occupancy and H3K4me3 enrichment. However, an additional cluster with gained accessibility (G4) also shows DMC1 occupancy and H3K4me3 enrichment but has modest H3.3 accumulation. The paper would benefit for additional discussion about the G4 cluster (which encompasses 960 peak calls). Is there any enrichment of PRDM9 sites in G4? If H3.3 exclusion governs meiotic DSBs, how does cluster G4 fit into the model?

      Response: We agree that, compared to G1+G3, cluster G4 shows an insignificant increase in H3.3 occupancy in the absence of ARID1A (Figure 6B). The plot profile associated with the heatmap confirms this result (Figure 6B). Therefore, cluster G4 is very distinct in its chromatin composition from G1+G3 upon the loss of ARID1A and, as such, is not inconsistent with our model of H3.3 antagonism with DSB sites. Additionally, we did not observe an enrichment of PRDM9 sites in G4. Since G4 does not display similar dynamics in H3.3 occupancy to G1+G3, DMC1 association might not be perturbed at G4 in response to the loss of ARID1A. Future studies will be required to determine the genomic associations of DMC1 and H3K4me3 in response to the loss of ARID1A.

      Comment 7: The impacts of ARID1A loss on DMC1 focus formation (reduced sex chromosome association) are very interesting and also raise additional questions. Are DMC1 foci on autosomes also affected during pachynema? The corresponding lack of apparent effect on RAD51 implies that breaks are still made and resected, enabling RAD51 filament formation. A more thorough quantitative assessment of RAD51 focus formation will be interesting in the long run, enabling determination of the number of break sites and the kinetics of repair, which the authors suggest is perturbed by ARID1A loss but doesn't directly test. It isn't clear how a nucleosomal factor (H3.3) would influence loading of recombinases onto ssDNA, especially if the alteration is not at the level of resection and ssDNA formation. Additional discussion of this point is warranted. Lastly, there currently are various notions for the interplay between RAD51 and DMC1 in filament formation and break repair, and brief discussion of this area and the implications of the new findings from the ARID1A CKO would strengthen the paper further.

      Response: The impact of H3.3 on the loading of recombinases might be an indirect consequence of ARID1A-governed sex-linked transcriptional repression. In a recent study, Alexander et al. (Nat. Commun, 2023, PMID: 36990976) showed that transcriptional activity and meiotic recombination are spatially compartmentalized during meiosis. Therefore, the persistence of elongating RNA polymerase II on a sex body depleted for H3.3 in the absence of ARID1A might contribute to the defect in DMC1 association. RAD51 and DMC1 are known to bind ssDNA at PRDM9/SPO11 designated DSB hotspots. However, these recombinases occupy unique domains. DMC1 localizes nearest the DSB breakpoint, promoting strand exchange, whereas RAD51 is further away (Hinch et al., PMID32610038). We show that loss of Arid1a decreases DMC1 foci on the XY chromosomes without affecting RAD51. These findings indicate that BAF-A plays a role in the loading and/or retention of DMC1 to the XY chromosomes. This information has been added to the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to both reviewers for taking the time to review our manuscript and data in great detail. We thank you for the fair assessment of our work, the helpful feedback, and for recognizing the value of our work. We have done our best to address your concerns below:

      eLife assessment This work reports a valuable finding on glucocorticoid signaling in male and female germ cells in mice, pointing out sexual dimorphism in transcriptomic responsiveness. While the evidence supporting the claims is generally solid, additional assessments would be required to fully confirm an inert GR signaling despite the presence of GR in the female germline and GR-mediated alternative splicing in response to dexamethasone treatment in the male germline. The work may interest basic researchers and physician-scientists working on reproduction and

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cincotta et al set out to investigate the presence of glucocorticoid receptors in the male and female embryonic germline. They further investigate the impact of tissue-specific genetically induced receptor absence and/or systemic receptor activation on fertility and RNA regulation. They are motivated by several lines of research that report inter and transgenerational effects of stress and or glucocorticoid receptor activation and suggest that their findings provide an explanatory mechanism to mechanistically back parental stress hormone exposure-induced phenotypes in the offspring.

      Strengths:

      A chronological immunofluorescent assessment of GR in fetal and early life oocyte and sperm development.

      RNA seq data that reveal novel cell type specific isoforms validated by q-RT PCR E15.5 in the oocyte.

      2 alternative approaches to knock out GR to study transcriptional outcomes. Oocytes: systemic GR KO (E17.5) with low input 3-tag seq and germline-specific GR KO (E15.5) on fetal oocyte expression via 10X single cell seq and 3-cap sequencing on sorted KO versus WT oocytes both indicating little impact on polyadenylated RNAs

      2 alternative approaches to assess the effect of GR activation in vivo (systemic) and ex vivo (ovary culture): here the RNA seq did show again some changes in germ cells and many in the soma.

      They exclude oocyte-specific GR signaling inhibition via beta isoforms.

      Perinatal male germline shows differential splicing regulation in response to systemic Dex administration, results were backed up with q-PCR analysis of splicing factors. Weaknesses:

      COMMENT #1: The presence of a protein cannot be entirely excluded based on IF data

      We agree that very low levels of GR could escape the detection by IF and confocal imaging. We feel that our IF data do match transcript data in our validation studies of the GR KO using (1) qRT-PCR on fetal ovary in Fig 2E and (2) scRNA-seq in germ cells and ovarian soma in Fig S2B.

      COMMENT #2: (staining of spermatids is referred to but not shown).

      You are correct that this statement was based on a morphological identification of spermatids using DAPI morphology. We have performed a co-stain for GR with the spermatocyte marker SYCP3, and the spermatid/spermatozoa marker PNA (Peanut Agglutinin; from Arachis hypogaea) in adult testis tissue. We have updated Figure 4D to reflect this change, as well as the corresponding text in the Results section.

      COMMENT #3: The authors do not consider post-transcriptional level a) modifications also triggered by GR activation b) non-coding RNAs (not assessed by seq).

      We thank the reviewer for raising this very important point about potential post-transcriptional (non-genomic) effects of GR in the fetal oocyte. We agree that while our RNA-seq results show only a minimal transcriptional response, we cannot rule out a non-canonical signaling function of GR, such as the regulation of cellular kinases (as reviewed elsewhere1), or the regulation of non coding RNAs at the post-transcriptional level, and we have amended the discussion to include a sentence on this point. However, while we fully acknowledge the possibility of GR regulating non-genomic level cellular signaling, we chose not to explore this option further based on the lack of any overall functional effect on meiotic progression when GR signaling was perturbed- either by KO (Figure 2D) or dex-mediated activation (Figure S3C).

      COMMENT #4: Sequencing techniques used are not total RNA but either are focused on all polyA transcripts (10x) or only assess the 3' prime end and hence are not ideal to study splicing

      We thank the reviewer for raising this concern, however this statement is not correct and we have clarified this point in the Results section to explain how the sequencing libraries of the male germ cell RNA-seq were prepared. We agree that certain sequencing techniques (such as 3’ Tag-Seq) that generate sequencing libraries from a limited portion of an entire transcript molecule are not appropriate for analysis of differential splicing. This was not the case, however, for the RNA-seq libraries prepared on our male germ cells treated with dexamethasone. These libraries were constructed using full length transcripts that were reverse transcribed using random hexamer priming, thus accounting for sequencing coverage across the full transcript length. As a result, this type of library prep technique should be sufficient for capturing differential splicing events along the length of the transcript. We do, however, point out that these libraries were constructed on polyA-enriched transcripts. Thus while we obtained full length transcript coverage for these polyA transcripts, any differential splicing taking place in non poly-adenylated RNA moieties were not captured. While we are excited about the possibility of exploring GR-mediated splicing regulation of other RNA species in the future, we chose to focus the scope of our current study on polyA mRNA molecules specifically.

      COMMENT #5: The number of replicates in the low input seq is very low and hence this might be underpowered

      While the number of replicates (n=3-4 per condition) is sufficient for performing statistical analysis of a standard RNA-seq experiment, we do acknowledge and agree with the reviewer that low numbers of FACS-sorted germ cells from individual embryos combined with the low input 3’ Tag-Seq technique could have led to higher sample variability than desired. Given that we validated our bulk RNA-seq analysis of GR knockout ovaries using an orthogonal single-cell RNA-seq approach, we feel that our conclusions regarding a lack of transcriptional changes upon GR deletion remain valid.

      COMMENT #6: Since Dex treatment showed some (modest) changes in oocyte RNA - effects of GR depletion might only become apparent upon Dex treatment as an interaction.

      We may be missing the nuance of this point, but our interpretation of an effect that is seen only when the KO is treated with Dex would be that the mechanism would not be autonomous in germ cells but indirect or off-target.

      COMMENT #7: Effects in oocytes following systemic Dex might be indirect due to GR activation in the soma.

      As both the oocytes and ovarian soma express GR during the window of dex administration, we agree that it is possible that the few modest changes seen in the oocyte transcriptome are the result of indirect effects following robust GR signaling in the somatic compartment. However, given that these modest oocyte transcript changes in response to dex treatment did not significantly alter the ability of oocytes to progress through meiosis, we chose not to explore this mechanism further.

      COMMENT #8: Even though ex vivo culture of ovaries shows GR translocation to the nucleus it is not sure whether the in vivo systemic administration does the same.

      AND

      The conclusion that fetal oocytes are resistant to GR manipulation is very strong, given that "only" poly A sequencing and few replicates of 3-prime sequencing have been analyzed and information is lacking on whether GR is activated in germ cells in the systemically dex-injected animals.

      If we understand correctly, the first part refers to a technical limitation and the second part takes issue with our interpretation of the data. For the former, we appreciate this astute insight on the conundrum of detecting a response to systemic dex in fetal oocytes, which is generally monitored by nuclear translocation of GR. As shown in Figure 1A and 1B, GR localization is overwhelmingly nuclear in fetal oocytes of WT animals at E13.5 without addition of any dex. We could not, therefore, use GR translocation as a proxy for activation in response to dex treatment. We instead used ex vivo organ culture to monitor localization changes, as we were able to maintain fetal ovaries ex vivo in hormone-depleted and ligand negative conditions. As shown in Fig. 3, these defined culture conditions elicited a shift of GR to the cytoplasm of fetal oocytes. This led us to conclude that GR is capable of translocating between nucleus and cytoplasm in fetal oocytes, and we were able to counteract this loss in nuclear localization by providing dex ligand in the media.

      We feel that our conclusion that oocytes are resistant to manipulation of glucocorticoid signaling despite their possession of the receptor and capacity for nuclear translocation is substantiated by multiple results: meiotic phenotyping, bulk RNA-seq and scRNA-seq analysis of both GR KO and dex dosed mice. Our basis for testing the timing and fidelity of meiotic prophase I was the coincident onset of GR expression in female germ cells at E13, and the disappearance of GR in neonatal oocytes as they enter meiotic arrest. The lack of transcriptional changes observed in oocytes in response to dex has made it even more challenging to demonstrate a bona fide “activation” of GR. Observation of a dose-dependent induction of the canonical GR response gene Fkbp5 in the somatic cells of the fetal ovary (Figure S3A and 3A) affirmed that dex traverses the placenta. We agree with the reviewer that it remains possible that dex or GR KO could lead to changes in epigenetic marks or small RNAs in oocytes, and have mentioned these possibilities in the discussion, but we note that even epigenetic perturbations during oocyte development such as the loss of Tet1 or Dnmt1 result in measurable changes in the transcriptome and the timing of meiotic prophase 2–4.

      COMMENT #9: This work is a good reference point for researchers interested in glucocorticoid hormone signaling fertility and RNA splicing. It might spark further studies on germline-specific GR functions and the impact of GR activation on alternative splicing. While the study provides a characterization of GR and some aspects of GR perturbation, and the negative findings in this study do help to rule out a range of specific roles of GR in the germline, there is still a range of other potential unexplored options. The introduction of the study eludes to implications for intergenerational effects via epigenetic modifications in the germline, however, it does not mention that the indirect effects of reproductive tissue GR signaling on the germline have indeed already been described in the context of intergenerational effects of stress.

      The reviewer raises an excellent point that we have not made sufficient distinction in our manuscript between prior studies of gestational stress and preconception stress and the light that our work may shed on those findings. We have revised the introduction to clarify this difference, and added reference to an outstanding study that identifies glucocorticoid-induced changes to microRNA cargo of extracellular vesicles shed by epididymal epithelial cells that when transferred to mature sperm can induce changes in the HPA axis and brain of offspring 5. Interestingly, this GR-mediated effect in the epididymal epithelial cells concurs with our observation in the adult testis that GR can be detected only cKit+ spermatogonia but not in subsequent stages of spermatids.

      COMMENT #10: Also, the study does not assess epigenetic modifications.

      We agree with the reviewer that exploring the role of GR in regulating epigenetic modifications within the germline is an area of extreme interest given the potential links between stress and transgenerational epigenetic inheritance. As this is a broader topic that requires a more thorough and comprehensive set of experiments, we have intentionally chosen to keep this work separate from the current study, and hope to expand upon this topic in the future.

      COMMENT #11: The conclusion that the persistence of a phenotype for up to three generations suggests that stress can induce lasting epigenetic changes in the germline is misleading. For the reader who is unfamiliar with the field, it is important to define much more precisely what is referred to as "a phenotype". Furthermore, this statement evokes the impression that the very same epigenetic changes in the germline have been observed across multiple generations.

      We see how this may be misleading, and we have amended the text of the introduction and discussion accordingly to avoid the use of the term “phenotype”.

      COMMENT #12: The evidence of the presence of GR in the germline is also somewhat limited - since other studies using sequencing have detected GR in the mature oocyte and sperm.

      As described above in response to Comment #2, we have included immunostaining of adult testis in a revised Figure 4D and shown that we detect GR in PLZF+ and cKIT+ spermatogonia. We also show low/minimal expression in some (SYCP3+) early meiotic spermatocytes, but not in (Lectin+) spermatids. We are not aware of any studies that have shown expression of GR protein in the mature oocyte.

      COMMENT #13: The discussion ends again on the implications of sex-specific differences of GR signaling in the context of stress-induced epigenetic inheritance. It states that the observed differences might relate to the fact that there is more evidence for paternal lineage findings, without considering that maternal lineage studies in epigenetic inheritance are generally less prevalent due to some practical factors - such as more laborious study design making use of cross-fostering or embryo transfer.

      We thank the reviewer for this valid point, and we have amended the discussion section.

      Reviewer #2 (Public Review):

      Summary:

      There is increasing evidence in the literature that rodent models of stress can produce phenotypes that persist through multiple generations. Nevertheless, the mechanism(s) by which stress exposure produces phenotypes are unknown in the directly affected individual as well as in subsequent offspring that did not directly experience stress. Moreover, it has also been shown that glucocorticoid stress hormones can recapitulate the effects of programmed stress. In this manuscript, the authors test the compelling hypothesis that glucocorticoid receptor (GR)-signaling is responsible for the transmission of phenotypes across generations. As a first step, the investigators test for a role of GR in the male and female germline. Using knockouts and GR agonists, they show that although germ cells in male and female mice have GR that appears to localize to the nucleus when stimulated, oocytes are resistant to changes in GR levels. In contrast, the male germline exhibits changes in splicing but no overt changes in fertility.

      Strengths:

      Although many of the results in this manuscript are negative, this is a careful and timely study that informs additional work to address mechanisms of transmission of stress phenotypes across generations and suggests a sexually dimorphic response to glucocorticoids in the germline. The work presented here is well-done and rigorous and the discussion of the data is thoughtful. Overall, this is an important contribution to the literature.

      Reviewer #1 (Recommendations For The Authors):

      RECOMMENDATION #1: To assess whether in females the systemic Dex administration directly activates GR in oocytes it would be great to assess GR activation following Dex administration, and ideally to see the effects abolished when Dex is administered to germline-specific KO animals.

      In regard to the recommendation to assess GR activation in response to systemic dex administration, we refer the reviewer back to our response in Comment #8 highlighting the difficulties defining and measuring GR activation in the germline.

      This therefore has made it difficult to assess whether any of the modest effects seen in response to dex are abolished in our germline-specific KO animals. While repeating our RNA-seq experiment in dex-dosed germline KO animals would address whether the ~60 genes induced in oocytes are the result of oocyte-intrinsic GR activity, we have decided not to explore this mechanism further due to the overall lack of a functional effect on meiotic progression in response to dex (Figure S3C).

      RECOMMENDATION #2: To further strengthen the link between GR and alternative splicing it would be great to see the dex administration experiment repeated in germline specific GR KO's.

      While we understand the reviewer’s suggestion to explore whether deletion of GR in the spermatogonia is sufficient to abrogate the dex-mediated decreases in splice factor expression, we chose not to explore the details of this mechanism given that deletion of GR in the male germline does not impair fertility (Figure 6).

      RECOMMENDATION #3: I am wondering how much a given reduction in one of the splicing factors indeed affects splicing events. Can the authors relate this to literature, or maybe an in vitro experiment can be done to see whether the level of differential splicing events detected is in a range that can be expected in the case of the magnitude of splicing factor reduction?

      It has been shown in many instances in the literature that a full genetic deletion of a single splice factor leads to impairments in spermatogenesis, and ultimately infertility 6–16. We suspect that dex treatment leads to fewer differential splicing events than a full splice factor deletion, given that dex treatment causes a broader decrease in splice factor expression without entirely abolishing any single splice factor. We have amended the discussion section to include this point. While we share the reviewer’s curiosity to compare the effects of dex vs genetic deletion of splicing machinery on the overall magnitude of differential splicing events, we unfortunately do not have access to mice with a floxed splice factor at this time. While we have considered knocking out one or more splice factors in an ex vivo cultured testis to compare alongside dex treatment, our efforts to date have proven unsuccessful due to high cell death upon culture of the postnatal testis for more than 24 hours.

      RECOMMENDATION #4: It is unclear from the methods whether in germline-specific KO's also the controls received tamoxifen.

      We thank the reviewer for catching this missing piece of information. All control embryos that were assessed received an equivalent dose of tamoxifen to the germline-specific KO embryos. The only difference between cKOs and controls was the presence of the Cre transgene. We have updated the Materials and Methods 3’ Tag-Seq sample preparation section to include the sentence: “Both GRcKO/cKO and control GRflox/flox embryos were collected from tamoxifen-injected dams, and thus were equally exposed to tamoxifen in utero”.

      Reviewer #2 (Recommendations For The Authors):

      I just have only a few comments/questions.

      RECOMMENDATION #5: It is somewhat surprising that GR is expressed in female germ cells, yet there doesn't seem to be a requirement. Is there any indication of what it does? Is the long-term stability of the germline compromised?

      We thank the reviewer for these questions, and we agree that it was quite surprising to find a lack of GR function in the female germline despite its robust expression. The question of whether loss of GR affects the long-term stability of the female germline is interesting, given that similar work in GR KO zebrafish has shown impairments to female reproductive capacity, yet only upon aging 17–19.

      While we have shared interest in this question, technical limitations thus far have prevented us from properly assessing the effect of GR loss in aged females. Homozygous deletion of GR results in embryonic lethality at approximately E17.5. Conditional deletion of GR using Oct4-CreERT2 with a single dose of tamoxifen (2.5 mg / 20g mouse) at E9.5 results in complete deletion of GR by E10.5, although dams consistently suffer from dystocia and are no longer able to deliver viable pups. While using the more active tamoxifen metabolite (4OHT) at 0.1 mg / 20g has allowed for successful delivery, the resulting deletion rate is very poor (see qPCR results in panel below, left). While using half the dose of standard tamoxifen (1.25 mg / 20g mouse) at E9.5 has on rare occasions led to a successful delivery, the resulting recombination efficiency is insufficient (Author response image 1 right panel).

      Author response image 1.

      While a Blimp1-Cre conditional KO model was used to assess male fertility on GR deletion, we believe this model may not be ideal for studying fertility in the context of aging. While Blimp1-Cre is highly specific to the germ cells within the gonad, there are many cell types outside of the gonad that express Blimp1, including the skin and certain cells of the immune system. It is unclear, particularly over the course of aging, whether any effects on fertility seen would be due to an oocyte-intrinsic effect, or the result of GR loss elsewhere in the body. While we hope to explore the role of GR in the aging oocyte further using alternative Cre models in the future, this is currently outside the scope of this work.

      RECOMMENDATION #6: Figure 5b: what is the left part of that panel? Is it the same volcano plot for germ cells as shown in part a but with splicing factors?

      We apologize if this panel was unclear. Yes, the left panel of Figure 5B is in fact the same volcano plot in 5A, labeled with splicing factors instead of top genes. We have edited Figure 5B and corresponding figure legend to clarify this.

      References: 1. Oakley, R.H., and Cidlowski, J.A. (2013). The biology of the glucocorticoid receptor: New signaling mechanisms in health and disease. J. Allergy Clin. Immunol. 132, 1033–1044. 10.1016/j.jaci.2013.09.007.

      1. Hargan-Calvopina, J., Taylor, S., Cook, H., Hu, Z., Lee, S.A., Yen, M.-R., Chiang, Y.-S., Chen, P.-Y., and Clark, A.T. (2016). Stage-Specific Demethylation in Primordial Germ Cells Safeguards against Precocious Differentiation. Dev. Cell 39, 75–86. 10.1016/j.devcel.2016.07.019.

      2. Hill, P.W.S., Leitch, H.G., Requena, C.E., Sun, Z., Amouroux, R., Roman-Trufero, M., Borkowska, M., Terragni, J., Vaisvila, R., Linnett, S., et al. (2018). Epigenetic reprogramming enables the transition from primordial germ cell to gonocyte. Nature 555, 392–396. 10.1038/nature25964.

      3. Eymery, A., Liu, Z., Ozonov, E.A., Stadler, M.B., and Peters, A.H.F.M. (2016). The methyltransferase Setdb1 is essential for meiosis and mitosis in mouse oocytes and early embryos. Development 143, 2767–2779. 10.1242/dev.132746.

      4. Chan, J.C., Morgan, C.P., Leu, N.A., Shetty, A., Cisse, Y.M., Nugent, B.M., Morrison, K.E., Jašarević, E., Huang, W., Kanyuch, N., et al. (2020). Reproductive tract extracellular vesicles are sufficient to transmit intergenerational stress and program neurodevelopment. Nat Commun 11, 1499. 10.1038/s41467-020-15305-w.

      5. Kuroda, M., Sok, J., Webb, L., Baechtold, H., Urano, F., Yin, Y., Chung, P., Rooij, D.G. de, Akhmedov, A., Ashley, T., et al. (2000). Male sterility and enhanced radiation sensitivity in TLS−/− mice. Embo J 19, 453–462. 10.1093/emboj/19.3.453.

      6. Liu, W., Wang, F., Xu, Q., Shi, J., Zhang, X., Lu, X., Zhao, Z.-A., Gao, Z., Ma, H., Duan, E., et al. (2017). BCAS2 is involved in alternative mRNA splicing in spermatogonia and the transition to meiosis. Nat Commun 8, 14182. 10.1038/ncomms14182.

      7. Li, H., Watford, W., Li, C., Parmelee, A., Bryant, M.A., Deng, C., O’Shea, J., and Lee, S.B. (2007). Ewing sarcoma gene EWS is essential for meiosis and B lymphocyte development. J Clin Invest 117, 1314–1323. 10.1172/jci31222.

      8. O’Bryan, M.K., Clark, B.J., McLaughlin, E.A., D’Sylva, R.J., O’Donnell, L., Wilce, J.A., Sutherland, J., O’Connor, A.E., Whittle, B., Goodnow, C.C., et al. (2013). RBM5 Is a Male Germ Cell Splicing Factor and Is Required for Spermatid Differentiation and Male Fertility. Plos Genet 9, e1003628. 10.1371/journal.pgen.1003628.

      9. Zagore, L.L., Grabinski, S.E., Sweet, T.J., Hannigan, M.M., Sramkoski, R.M., Li, Q., and Licatalosi, D.D. (2015). RNA Binding Protein Ptbp2 Is Essential for Male Germ Cell Development. Mol Cell Biol 35, 4030–4042. 10.1128/mcb.00676-15.

      10. Xu, K., Yang, Y., Feng, G.-H., Sun, B.-F., Chen, J.-Q., Li, Y.-F., Chen, Y.-S., Zhang, X.-X., Wang, C.-X., Jiang, L.-Y., et al. (2017). Mettl3-mediated m6A regulates spermatogonial differentiation and meiosis initiation. Cell Res 27, 1100–1114. 10.1038/cr.2017.100.

      11. Horiuchi, K., Perez-Cerezales, S., Papasaikas, P., Ramos-Ibeas, P., López-Cardona, A.P., Laguna-Barraza, R., Balvís, N.F., Pericuesta, E., Fernández-González, R., Planells, B., et al. (2018). Impaired Spermatogenesis, Muscle, and Erythrocyte Function in U12 Intron Splicing-Defective Zrsr1 Mutant Mice. Cell Reports 23, 143–155. 10.1016/j.celrep.2018.03.028.

      12. Ehrmann, I., Crichton, J.H., Gazzara, M.R., James, K., Liu, Y., Grellscheid, S.N., Curk, T., Rooij, D. de, Steyn, J.S., Cockell, S., et al. (2019). An ancient germ cell-specific RNA-binding protein protects the germline from cryptic splice site poisoning. Elife 8, e39304. 10.7554/elife.39304.

      13. Legrand, J.M.D., Chan, A.-L., La, H.M., Rossello, F.J., Änkö, M.-L., Fuller-Pace, F.V., and Hobbs, R.M. (2019). DDX5 plays essential transcriptional and post-transcriptional roles in the maintenance and function of spermatogonia. Nat Commun 10, 2278. 10.1038/s41467-019-09972-7.

      14. Yuan, S., Feng, S., Li, J., Wen, H., Liu, K., Gui, Y., Wen, Y., and Wang, X. (2021). hnRNPH1 recruits PTBP2 and SRSF3 to cooperatively modulate alternative pre-mRNA splicing in germ cells and is essential for spermatogenesis and oogenesis. 10.21203/rs.3.rs-1060705/v1.

      15. Wu, R., Zhan, J., Zheng, B., Chen, Z., Li, J., Li, C., Liu, R., Zhang, X., Huang, X., and Luo, M. (2021). SYMPK Is Required for Meiosis and Involved in Alternative Splicing in Male Germ Cells. Frontiers Cell Dev Biology 9, 715733. 10.3389/fcell.2021.715733.

      16. Maradonna, F., Gioacchini, G., Notarstefano, V., Fontana, C.M., Citton, F., Valle, L.D., Giorgini, E., and Carnevali, O. (2020). Knockout of the Glucocorticoid Receptor Impairs Reproduction in Female Zebrafish. Int J Mol Sci 21, 9073. 10.3390/ijms21239073.

      17. Facchinello, N., Skobo, T., Meneghetti, G., Colletti, E., Dinarello, A., Tiso, N., Costa, R., Gioacchini, G., Carnevali, O., Argenton, F., et al. (2017). nr3c1 null mutant zebrafish are viable and reveal DNA-binding-independent activities of the glucocorticoid receptor. Sci Rep-uk 7, 4371. 10.1038/s41598-017-04535-6.

      18. Faught, E., Santos, H.B., and Vijayan, M.M. (2020). Loss of the glucocorticoid receptor causes accelerated ovarian ageing in zebrafish. Proc Royal Soc B 287, 20202190. 10.1098/rspb.2020.2190.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors develop a method to fluorescently tag peptides loaded onto dendritic cells using a two-step method with a tetracystein motif modified peptide and labelling step done on the surface of live DC using a dye with high affinity for the added motif. The results are convincing in demonstrating in vitro and in vivo T cell activation and efficient label transfer to specific T cells in vivo. The label transfer technique will be useful to identify T cells that have recognised a DC presenting a specific peptide antigen to allow the isolation of the T cell and cloning of its TCR subunits, for example. It may also be useful as a general assay for in vitro or in vivo T-DC communication that can allow the detection of genetic or chemical modulators.

      Strengths:

      The study includes both in vitro and in vivo analysis including flow cytometry and two-photon laser scanning microscopy. The results are convincing and the level of T cell labelling with the fluorescent pMHC is surprisingly robust and suggests that the approach is potentially revealing something about fundamental mechanisms beyond the state of the art.

      Weaknesses:

      The method is demonstrated only at high pMHC density and it is not clear if it can operate at at lower peptide doses where T cells normally operate. However, this doesn't limit the utility of the method for applications where the peptide of interest is known. It's not clear to me how it could be used to de-orphan known TCR and this should be explained if they want to claim this as an application. Previous methods based on biotin-streptavidin and phycoerythrin had single pMHC sensitivity, but there were limitations to the PE-based probe so the use of organic dyes could offer advantages.

      We thank the reviewer for the valuable comments and suggestions. Indeed, we have shown and optimized this labeling technique for a commonly used peptide at rather high doses to provide a proof of principle for the possible use of tetracysteine tagged peptides for in vitro and in vivo studies. However, we completely agree that the studies that require different peptides and/or lower pMHC concentrations may require preliminary experiments if the use of biarsenical probes is attempted. We think it can help investigate the functional and biological properties of the peptides for TCRs deorphaned by techniques. Tetracysteine tagging of such peptides would provide a readily available antigen-specific reagent for the downstream assays and validation. Other possible uses for modified immunogenic peptides could be visualizing the dynamics of neoantigen vaccines or peptide delivery methods in vivo. For these additional uses, we recommend further optimization based on the needs of the prospective assay.

      Reviewer #2 (Public Review):

      Summary:

      The authors here develop a novel Ovalbumin model peptide that can be labeled with a site-specific FlAsH dye to track agonist peptides both in vitro and in vivo. The utility of this tool could allow better tracking of activated polyclonal T cells particularly in novel systems. The authors have provided solid evidence that peptides are functional, capable of activating OTII T cells, and that these peptides can undergo trogocytosis by cognate T cells only.

      Strengths:

      -An array of in vitro and in vivo studies are used to assess peptide functionality.

      -Nice use of cutting-edge intravital imaging.

      -Internal controls such as non-cogate T cells to improve the robustness of the results (such as Fig 5A-D).

      -One of the strengths is the direct labeling of the peptide and the potential utility in other systems.

      Weaknesses:

      1. What is the background signal from FlAsH? The baselines for Figure 1 flow plots are all quite different. Hard to follow. What does the background signal look like without FLASH (how much fluorescence shift is unlabeled cells to No antigen+FLASH?). How much of the FlAsH in cells is actually conjugated to the peptide? In Figure 2E, it doesn't look like it's very specific to pMHC complexes. Maybe you could double-stain with Ab for MHCII. Figure 4e suggests there is no background without MHCII but I'm not fully convinced. Potentially some MassSpec for FLASH-containing peptides.

      We thank the reviewer for pointing out a possible area of confusion. In fact, we have done extensive characterization of the background and found that it has varied with the batch of FlAsH, TCEP, cytometer and also due to the oxidation prone nature of the reagents. Because Figure 1 subfigures have been derived from different experiments, a combination of the factors above have likely contributed to the inconsistent background. To display the background more objectively, we have now added the No antigen+Flash background to the revised Fig 1.

      It is also worthwhile noting that nonspecific Flash incorporation can be toxic at increasing doses, and live cells that display high backgrounds may undergo early apoptotic changes in vitro. However, when these cells are adoptively transferred and tracked in vivo, the compromised cells with high background possibly undergo apoptosis and get cleared by macrophages in the lymph node. The lack of clearance in vitro further contributes to different backgrounds between in vitro and in vivo, which we think is also a possible cause for the inconsistent backgrounds throughout the manuscript. Altogether, comparison of absolute signal intensities from different experiments would be misleading and the relative differences within each experiment should be relied upon. We have added further discussion about this issue.

      1. On the flip side, how much of the variant peptides are getting conjugated in cells? I'd like to see some quantification (HPLC or MassSpec). If it's ~10% of peptides that get labeled, this could explain the low shifts in fluorescence and the similar T cell activation to native peptides if FlasH has any deleterious effects on TCR recognition. But if it's a high rate of labeling, then it adds confidence to this system.

      We agree that mass spectrometry or, more specifically tandem MS/MS, would be an excellent addition to support our claim about peptide labeling by FlAsH being reliable and non-disruptive. Therefore, we have recently undertaken a tandem MS/MS quantitation project with our collaborators. However, this would require significant time to determine the internal standard based calibration curves and to run both analytical and biological replicates. Hence, we have decided pursuing this as a follow up study and added further discussion on quantification of the FlAsH-peptide conjugates by tandem MS/MS.

      1. Conceptually, what is the value of labeling peptides after loading with DCs? Why not preconjugate peptides with dye, before loading, so you have a cleaner, potentially higher fluorescence signal? If there is a potential utility, I do not see it being well exploited in this paper. There are some hints in the discussion of additional use cases, but it was not clear exactly how they would work. One mention was that the dye could be added in real-time in vivo to label complexes, but I believe this was not done here. Is that feasible to show?

      We have already addressed preconjugation as a possible avenue for labeling peptides. In our hands, preconjugation resulted in low FlAsH intensity overall in both the control and tetracysteine labeled peptides (Author response image 1). While we don’t have a satisfactory answer as to why the signal was blunted due to preconjugation, it could be that the tetracysteine tagged peptides attract biarsenical compounds better intracellularly. It may be due to the redox potential of the intracellular environment that limits disulfide bond formation. (PMID: 18159092)

      Author response image 1.

      Preconjugation yields poor FlAsH signal. Splenic DCs were pulsed with peptide then treated with FlAsH or incubated with peptide-FlAsH preconjugates. Overlaid histograms show the FlAsH intensities on DCs following the two-step labeling (left) and preconjugation (right). Data are representative of two independent experiments, each performed with three biological replicates.

      1. Figure 5D-F the imaging data isn't fully convincing. For example, in 5F and 2G, the speeds for T cells with no Ag should be much higher (10-15micron/min or 0.16-0.25micron/sec). The fact that yours are much lower speeds suggests technical or biological issues, that might need to be acknowledged or use other readouts like the flow cytometry.

      We thank the reviewer for drawing attention to this technical point. We would like to point out that the imaging data in fig 5 d-f was obtained from agarose embedded live lymph node sections. Briefly, the lymph nodes were removed, suspended in 2% low melting temp agarose in DMEM and cut into 200 µm sections with a vibrating microtome. Prior to imaging, tissue sections were incubated in complete RPMI medium at 37 °C for 2 h to resume cell mobility. Thus, we think the cells resuming their typical speeds ex vivo may account for slightly reduced T cell speeds overall, for both control and antigen-specific T cells (PMID: 32427565, PMID: 25083865). We have added text to prevent the ambiguity about the technique for dynamic imaging. The speeds in Figure 2g come from live imaging of DC-T cell cocultures, in which the basal cell movement could be hampered by the cell density. Additionally, glass bottom dishes have been coated with Fibronectin to facilitate DC adhesion, which may be responsible for the lower average speeds of the T cells in vitro.

      Reviewer #1 (Recommendations For The Authors):

      Does the reaction of ReAsH with reactive sites on the surface of DC alter them functionally? Functions have been attributed to redox chemistry at the cell surface- could this alter this chemistry?

      We thank the reviewer for the insight. It is possible that the nonspecific binding of biarsenical compounds to cysteine residues, which we refer to as background throughout the manuscript, contribute to some alterations. One possible way biarsenicals affect the redox events in DCs can be via reducing glutathione levels (PMID: 32802886). Glutathione depletion is known to impair DC maturation and antigen presentation (PMID: 20733204). To avoid toxicity, we have carried out a stringent titration to optimize ReAsH and FlAsH concentrations for labeling and conducted experiments using doses that did not cause overt toxicity or altered DC function.

      Have the authors compared this to a straightforward approach where the peptide is just labelled with a similar dye and incubated with the cell to load pMHC using the MHC knockout to assess specificity? Why is this that involves exposing the DC to a high concentration of TCEP, better than just labelling the peptide? The Davis lab also arrived at a two-step method with biotinylated peptide and streptavidin-PE, but I still wonder if this was really necessary as the sensitivity will always come down to the ability to wash out the reagents that are not associated with the MHC.

      We agree with the reviewer that small undisruptive fluorochrome labeled peptide alternatives would greatly improve the workflow and signal to noise ratio. In fact, we have been actively searching for such alternatives since we have started working on the tetracysteine containing peptides. So far, we have tried commercially available FITC and TAMRA conjugated OVA323-339 for loading the DCs, however failed to elicit any discernible signal. We also have an ongoing study where we have been producing and testing various in-house modified OVA323-339 that contain fluorogenic properties. Unfortunately, at this moment, the ones that provided us with a crisp, bright signal for loading revealed that they have also incorporated to DC membrane in a nonspecific fashion and have been taken up by non-cognate T cells from double antigen-loaded DCs. We are actively pursuing this area of investigation and developing better optimized peptides with low/non-significant membrane incorporation.

      Lastly, we would like to point out that tetracysteine tags are visible by transmission electron microscopy without FlAsH treatment. Thus, this application could add a new dimension for addressing questions about the antigen/pMHCII loading compartments in future studies. We have now added more in-depth discussion about the setbacks and advantages of using tetracysteine labeled peptides in immune system studies.

      The peptide dosing at 5 µM is high compared to the likely sensitivity of the T cells. It would be helpful to titrate the system down to the EC50 for the peptide, which may be nM, and determine if the specific fluorescence signal can still be detected in the optimal conditions. This will not likely be useful in vivo, but it will be helpful to see if the labelling procedure would impact T cell responses when antigen is limited, which will be more of a test. At 5 µM it's likely the system is at a plateau and even a 10-fold reduction in potency might not impact the T cell response, but it would shift the EC50.

      We thank the reviewer for the comment and suggestion. We agree that it is possible to miss minimally disruptive effects at 5 µM and titrating the native peptide vs. modified peptide down to the nM doses would provide us a clearer view. This can certainly be addressed in future studies and also with other peptides with different affinity profiles. A reason why we have chosen a relatively high dose for this study was that lowering the peptide dose had costed us the specific FlAsH signal, thus we have proceeded with the lowest possible peptide concentration.

      In Fig 3b the level of background in the dsRed channel is very high after DC transfer. What cells is this associated with and does this appear be to debris? Also, I wonder where the ReAsH signal is in the experiments in general. I believe this is a red dye and it would likely be quite bright given the reduction of the FlAsH signal. Will this signal overlap with signals like dsRed and PHK-26 if the DC is also treated with this to reduce the FlAsH background?

      We have already shown that ReAsH signal with DsRed can be used for cell-tracking purposes as they don’t get transferred to other cells during antigen specific interactions (Author response image 2). In fact, combining their exceptionally bright fluorescence provided us a robust signal to track the adoptively transferred DCs in the recipient mice. On the other hand, the lipophilic membrane dye PKH-26 gets transferred by trogocytosis while the remaining signal contributes to the red fluorescence for tracking DCs. Therefore, the signal that we show to be transferred from DCs to T cells only come from the lipophilic dye. To address this, we have added a sentence to elaborate on this in the results section. Regarding the reviewer’s comment on DsRed background in Figure 3b., we agree that the cells outside the gate in recipient mice seems slightly higher that of the control mice. It may suggest that the macrophages clearing up debris from apoptotic/dying DCs might contribute to the background elicited from the recipient lymph node. Nevertheless, it does not contribute to any DsRed/ReAsH signal in the antigen-specific T cells.

      Author response image 2.

      ReAsH and DsRed are not picked up by T cells during immune synapse. DsRed+ DCs were labeled with ReAsH, pulsed with 5 μM OVACACA, labeled with FlAsH and adoptively transferred into CD45.1 congenic mice mice (1-2 × 106 cells) via footpad. Naïve e450-labeled OTII and e670-labeled polyclonal CD4+ T cells were mixed 1:1 (0.25-0.5 × 106/ T cell type) and injected i.v. Popliteal lymph nodes were removed at 42 h post-transfer and analyzed by flow cytometry. Overlaid histograms show the ReAsh/DsRed, MHCII and FlAsH intensities of the T cells. Data are representative of two independent experiments with n=2 mice per group.

      In Fig 5b there is a missing condition. If they look at Ea-specific T cells for DC with without the Ova peptide do they see no transfer of PKH-26 to the OTII T cells? Also, the FMI of the FlAsH signal transferred to the T cells seems very high compared to other experiments. Can the author estimate the number of peptides transferred (this should be possible) and would each T cell need to be collecting antigens from multiple DC? Could the debris from dead DC also contribute to this if picked up by other DC or even directly by the T cells? Maybe this could be tested by transferring DC that are killed (perhaps by sonication) prior to inoculation?

      To address the reviewer’s question on the PKH-26 acquisition by T cells, Ea-T cells pick up PKH-26 from Ea+OVA double pulsed DCs, but not from the unpulsed or single OVA pulsed DCs. OTII T cells acquire PKH-26 from OVA-pulsed DCs, whereas Ea T cells don’t (as expected) and serve as an internal negative control for that condition. Regarding the reviewer’s comment on the high FlAsH signal intensity of T cells in Figure 5b, a plausible explanation can be that the T cells accumulate pMHCII through serial engagements with APCs. In fact, a comparison of the T cell FlAsH intensities 18 h and 36-48 h post-transfer demonstrate an increase (Author response image 3) and thus hints at a cumulative signal. As DCs are known to be short-lived after adoptive transfer, the debris of dying DCs along with its peptide content may indeed be passed onto macrophages, neighboring DCs and eventually back to T cells again (or for the first time, depending on the T:DC ratio that may not allow all T cells to contact with the transferred DCs within the limited time frame). We agree that the number and the quality of such contacts can be gauged using fluorescent peptides. However, we think peptides chemically conjugated to fluorochromes with optimized signal to noise profiles and with less oxidation prone nature would be more suitable for quantification purposes.

      Author response image 3.

      FlAsH signal acquisition by antigen specific T cells becomes more prominent at 36-48 h post-transfer. DsRed+ splenic DCs were double-pulsed with 5 μM OVACACA and 5 μM OVA-biotin and adoptively transferred into CD45.1 recipients (2 × 106 cells) via footpad. Naïve e450-labeled OTII (1 × 106 cells) and e670-labeled polyclonal T cells (1 × 106 cells) were injected i.v. Popliteal lymph nodes were analyzed by flow cytometry at 18 h or 48 h post-transfer. Overlaid histograms show the T cell levels of OVACACA (FlAsH). Data are representative of three independent experiments with n=3 mice per time point

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in weaknesses 1 & 2, more validation of how much of the FlAsH fluorescence is on agonist peptides and how much is non-specific would improve the interpretation of the data. Another option would be to preconjugate peptides but that might be a significant effort to repeat the work.

      We agree that mass spectrometry would be the gold standard technique to measure the percentage of tetracysteine tagged peptide is conjugated to FlAsH in DCs. However, due to the scope of such endevour this can only be addressed as a separate follow up study. As for the preconjugation, we have tried and unfortunately failed to get it to work (Reviewer Figure 1). Therefore, we have shifted our focus to generating in-house peptide probes that are chemically conjugated to stable and bright fluorophore derivates. With that, we aim to circumvent the problems that the two-step FlAsH labeling poses.

      Along those lines, do you have any way to quantify how many peptides you are detecting based on fluorescence? Being able to quantify the actual number of peptides would push the significance up.

      We think two step procedure and background would pose challenges to such quantification in this study. although it would provide tremendous insight on the antigen-specific T cell- APC interactions in vivo, we think it should be performed using peptides chemically conjugated to fluorochromes with optimized signal to noise profiles.

      In Figure 3D or 4 does the SA signal correlate with Flash signal on OT2 cells? Can you correlate Flash uptake with T cell activation, downstream of TCR, to validate peptide transfers?

      To answer the reviewer’s question about FlAsH and SA correlation, we have revised the Figure 3d to show the correlation between OTII uptake of FlAsH, Streptavidin and MHCII. We also thank the reviewer for the suggestion on correlating FlAsH uptake with T cell activation and/or downstream of TCR activation. We have used proliferation and CD44 expressions as proxies of activation (Fig 2, 6). Nevertheless, we agree that the early events that correspond to the initiation of T-DC synapse and FlAsH uptake would be valuable to demonstrate the temporal relationship between peptide transfer and activation. Therefore, we have addressed this in the revised discussion.

      Author response image 4.

      FlAsH signal acquisition by antigen specific T cells is correlates with the OVA-biotin (SA) and MHCII uptake. DsRed+ splenic DCs were double-pulsed with 5 μM OVACACA and 5 μM OVA-biotin and adoptively transferred into CD45.1 recipients (2 × 106 cells) via footpad. Naïve e450-labeled OTII (1 × 106 cells) and e670-labeled polyclonal T cells (1 × 106 cells) were injected i.v. Popliteal lymph nodes were analyzed by flow cytometry. Overlaid histograms show the T cell levels of OVACACA (FlAsH) at 48 h post-transfer. Data are representative of three independent experiments with n=3 mice.

      Minor:

      Figure 3F, 5D, and videos: Can you color-code polyclonal T cells a different color than magenta (possibly white or yellow), as they have the same look as the overlay regions of OT2-DC interactions (Blue+red = magenta).

      We apologize for the inconvenience about the color selection. We have had difficulty in assigning colors that are bright and distinct. Unfortunately, yellow and white have also been easily mixed up with the FlAsH signal inside red and blue cells respectively. We have now added yellow and white arrows to better point out the polyclonal vs. antigen specific cells in 3f and 5d.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This nice study by Miyano combines slice electrophysiology and superresolution microscopy to address the role of RBP2 in Ca2+ channel clustering and neurotransmitter release at hippocampal mossy fiber terminals. While a number of studies demonstrated a critical role for RBPs in clustering Ca2+ channels at other synapses and some provided evidence for a role of the protein in molecular coupling of Ca2+ channels and release sites, the present study targets another key synapse that is an important model for presynaptic studies and offers access to a microdomain controlled synaptic vesicle (SV) release mechanism with low initial release probability.

      Summarizing a large body of high-quality work, the authors demonstrate reduced Ca2+ currents and a reduced release probability. They attribute the latter to the reduced Ca2+ influx and can restore release by increasing Ca2+ influx. Moreover, they propose an altered fusion competence of the SVs, which is not so strongly supported by the data in my view.

      The effects are relatively small, but I think the careful analysis of the RBP role at the mossy fiber synapse is an important contribution.

      We thank the reviewer for careful assessment of the paper. We agree that while reduced Ca influx in KO is relatively straightforward, impaired priming is somewhat indirect, remaining as suggestion. We also noted that Moser and colleagues have analyzed the function of RIM-BP2 at hair cell synapses and also showed reduced Ca influx. In cortical synapses, there have been no study using direct presynaptic recording. In the revision, we carefully cited previous studies and tried to be fair. We hope that the current revision is much improved.

      Reviewer #2 (Public Review):

      The proper expression and organization of CaV channels at the presynaptic release sites are subject to coordinative and redundant control of many active zone-specific molecules including RIM-BPs. Previous studies have demonstrated that ablation of RIM-BPs in various mammalian synapses causes significant impairment of synaptic transmission, either by reducing CaV expression or decoupling CaV from synaptic vesicles. The mechanisms remain unknown.

      In the manuscript, Sakaba and colleagues aimed to examine the specific role of RIM-BP2 at the hippocampal mossy fiber-CA3 pyramidal cell synapse, which is well-characterized by low initial release probability and strong facilitation during repetitive stimulation. By directly recording Ca2+ currents and capacitance jumps from the MF boutons, which is very challenging but feasible, they showed that depolarization-evoked Ca2+ influx was reduced significantly (~39%) by KO of RIM-BP2, but no impacts on Ca-induced exocytosis and RRP (measured by capacitance change). They used STED microscopy to image the spatial distribution of the CaV2.1 cluster but found no change in the cluster number with a slight decrease in cluster intensity (~20%). They concluded that RIM-BP2 functions in tonic synapses by reducing CaV expression and thus differentially from phasic synapses by decoupling CaV-SV.

      In general, they provide solid data showing that RIM-BP2 KO reduces Ca influx at MF-CA3 synapse, but the phenotype is not new as Moser and colleagues have also used presynaptic recording and capacitance measurement and shown that RIM-BP2 KO reduces Ca2+ influx at hair cell active zone (Krinner et al., 2017), although at different synapse model expressing CaV1.3 instead of CaV2.1. Further, the concept that RIM-BP2 plays diverse functions in transmitter release at different central synapses has also been proposed with solid evidence (Brockmann et al., 2019).

      We thank the reviewer for careful reading of the ms. We agree that previous studies have sown reduced Ca influx at hair cells, and diverse function of RIM-BP2 in different central synapses have been proposed by Brockman et al. The new point of this study is we firmly and quantitatively show the reduced Ca currents using direct presynaptic recording, which has not been done in mossy fiber synapses or cortical synapses in general. Quantitative and time-resolved measurements of the presynaptic currents cannot be done by other methods, so far. In this revision, we point this out carefully.  

      Reviewer #1 (Recommendations For The Authors):

      The MS is overall carefully prepared and I have only a few minor comments to help with further improving the manuscript.

      Abstract:

      I think the notion of different RBP function at tonic and phasic synapses is not so well founded. The reduced number of Ca2+ channels and their altered topography have been shown in multiple synapses that also include those with phasic release. Quantitative structural and functional analysis of presynaptic Ca2+ channels of RBP-2 and RBP1-2 DKO deficient AZs closely related to the present study has e.g. been provided for auditory synapses (e.g. hair cells, endbulb/calyx of end synapses that provide both phasic and sustained release.

      In abstract, we have omitted description of phasic vs tonic synapses, because it is not well founded as the reviewer pointed out. Specifically, in abstract (Line 13~):

      “Synaptic vesicles dock and fuse at the presynaptic active zone (AZ), the specialized site for transmitter release. AZ proteins play multiple roles such as recruitment of Ca2+ channels as well as synaptic vesicle docking, priming and fusion. However, the precise role of each AZ protein type remains unknown. In order to dissect the role of RIM-BP2 at mammalian cortical synapses having low release probability, we applied direct electrophysiological recording and super-resolution imaging to hippocampal mossy fiber terminals of RIM-BP2 KO mice. By using direct presynaptic recording, we found the reduced Ca2+ currents. The measurements of EPSCs and presynaptic capacitance suggested that the initial release probability was lowered because of the reduced Ca2+ influx and impaired fusion competence in RIM-BP2 KO. Nevertheless, larger Ca2+ influx restored release partially. Consistent with presynaptic recording, STED microscopy suggested less abundance of P/Q-type Ca2+ channels at AZs deficient in RIM-BP2. Our results suggest that the RIM-BP2 regulates both Ca2+ channel abundance and transmitter release at mossy fiber synapses.”

      Intro:

      Line 48: consider adding Butola et al., 2021 /endbuld of Held to reference which concurs on the notion made for Calyx. However, a contrasting finding was made for another synapse with tight coupling: RBP2 deletion did not alter tight coupling in hair cells (Krinner et al., 2017). Line 51: RBP-DKO/lack of additional effect of RBP1 deletion: suggest adding Krinner et al., 2021 to reference, which concurs with the notion made for hair cells.

      We cited Butola et al., 2021 (Line 49) and Krinner et al., 2021 (Line 52), as the reviewer suggested.

      Results:

      STED microscopy: I am concerned with two aspects of the analysis/presentation. I) I recommend replacing density with abundance as the authors do not resolve single channels. II) I appreciate the note of caution about the fact that STED nanoscopy due to the non-linear nature of the depletion process should/could not be easily used to quantify copy numbers based on immunofluorescence. I would recommend the authors perform 2D Gaussian fitting to at least the Cav2.1 immunofluorescent spots neighboring Munc13-1 spots and report the short and long axis estimates as well as potentially the area. Should the authors have confocal Cav2.1 and Cav2.2 immunofluorescent data co-acquired with STED of Munc13-1, this would be very valuable additional information, but I do not think the experiment is essential for the sake of publication if it was not done already, given the large body of high-quality physiology data.

      I) We have changed the term from density to abundance as the reviewer suggested throughout the manuscript.

      II) As the reviewer suggested, we have carried out 2D Gaussian fitting of Cav2.1 spots. The length, width, and area of Cav2.1 clusters in the AZ were not different between WT and RIM-BP2 KO terminals (Line 431-433, Figure 7-figure supplement 4). The spatial resolution of STED, especially at mossy fiber synapses in the tissue, and a small difference between WT and KO (~30 % expected from electrophysiology) could prevent detection of the difference, unlike ribbon synapses and fly NMJ where release sites and Ca channel clusters are well defined. We should also note that the intensity was calculated similar to previous studies (integral of signal intensity, Krinner et al., 2017), and not absolute peak intensity.  

      As the reviewer suggested, we have added confocal data ((Line 434-436, Figure 7-figure supplement 5). We have determined the AZ area from the Munc13-1 STED data, and Munc13-1, Cav2.1 and Cav2.2 intensities were quantified. As shown in the figure, only Ca2.1 intensity was reduced in KO, consistent with the STED data.

      Nevertheless, we should be cautious about interpretation of the intensity as the reviewer suggested, and are aware that the data are just consistent with electrophysiology. From imaging, we only see a qualitative rather than quantitative difference between WT and KO.

      Discussion:

      I think the focus on alterations of presynaptic Ca channels could be further strengthened along with the discussion of the relevant previous studies.

      Thank you for the suggestion. We have added a paragraph as shown below in the discussion (Line 531~).

      “By using direct presynaptic patch clamp recordings, we here observed a decrease of Ca2+ current amplitudes (~30%) in RIM-BP2 KO mice (Fig. 1). Consistently, STED microscopy supported reduced abundance of P/Q-type Ca2+ channels (Cav2.1) in the mutant mossy fiber terminal (Fig. 7). Interestingly, this observation is similar to that at Drosophila NMJ and hair cell synapses (Liu et al., 2011; Krinner et al., 2017), but not that at other synapses (Acuna et al., 2015; Grauel et al., 2016; Butola et al., 2021), suggesting that the functional role of RIM-BP2 in recruiting Ca2+ channels differs among synapse types. “

      Reviewer #2 (Recommendations For The Authors):

      Minor questions:

      1) The title is misleading as it only shows RIM-BP2 regulates CaV expression but not clustering.

      This has been pointed out by the 1st reviewer, too. We have adopted the term “abundance” as suggested by the 1st reviewer and changed to “RIM-BP2 regulates Ca2+ channel abundance and neurotransmitter release at hippocampal mossy fiber terminals.”

      2) Figure 7 legend. Again, RIM-BP2 only changes the intensity of CaV2.1 clusters but not the density.

      Changed Figure 7 title from “RIM-BP2 deletion alters the density …” to “RIM-BP2 deletion alters the signal intensity …”.

      3) Line 31: "Ca2+ influx through voltage-gated Ca2+ channels triggers neurotransmitter release from synaptic vesicles within a millisecond" is not correct. Ca-evoked transmitter release can only occur with such fast speed at very specialized synapses such as the calyx of Held but not at general chemical synapses.

      We changed “within a millisecond” to “within milliseconds” (Line 30).

      4) Line 44-46: In Drosophila NMJs and at Drosophila NMJs are redundant.

      We eliminated “at Drosophila NMJs”.

      5) The authors should use the verb tense consistently throughout the manuscript such as"In RIM-BP1,2 DKO mice, the coupling between Ca2+ channels and synaptic vesicles became loose, and action potential-evoked neurotransmitter release was reduced at the calyx of Held synapse (Acuna et al., 2015). At hippocampal CA3-CA1 synapses, RIM-BP2 deletion alters Ca2+ channel localization at the AZs without altering total Ca2+ influx. Besides, RIM-BP1,2 DKO has no additional effect...".

      We changed verb tenses in Line 46-49, Line 55-58, and Line 62-67. We also checked the ms once more. Thank you for pointing this out.

      6) Line 59: technically difficulty should be technical difficulty.

      Fixed.

      7) Figure 4A-B are representative traces of 0.5 mM EGTA (black) or 5 mM EGTA (red) recorded from the same terminals or from different terminals but simply superimposed?

      Representative traces are recorded from different terminals. We describe this point in the figure legend (Fig 4A). We are very sorry for confusion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (public):

      1) “It is unclear whether new in vivo experiments were conducted for this study”.

      All in vivo experiments were conducted for this study by using previously published fly stocks to directly compare N- and C-terminal shedding side-by-side in two Hh-dependent developmental systems. This is now clearly stated in the revised supplement (Fig. S8). We also conducted these experiments because previous in vivo studies in flies often relied on Hh overexpression in the fat body, raising questions about their physiological relevance. Our in vivo analyses of Hh function in wing and eye discs are more physiologically relevant and can explain the previously reported presence of non-lipidated bioactive Hh in disc tissue (PMID: 23554573).

      2) “A critical shortcoming of the study is that experiments showing Shh secretion/export do not include a Shh(-) control condition. Without demonstration that the bands analyzed are specific for Shh(+) conditions, these experiments cannot be appropriately evaluated”.

      The Cell Signaling Technology C9C5 anti-Shh antibody used in our study is highly specific against Shh, and it has been used in over 60 publications. C9C5 even lacks cross-reactivity with highly similar Ihh or Dhh (https://www.cellsignal.com/products/primary-antibodies/shh-c9c5-rabbit-mab/2207?_requestid=1528451). We confirmed C9C5 specificity repeatedly (one example is shown below; another quality control that includes media of mock-transfected cells is now shown in Fig. S1) and never observed unspecific bands under any experimental condition. As shown below, C9C5 and R&D AF464 anti-Shh antibodies (the latter were previously used in our lab) detect the same bands.

      Author response image 1.

      Shh immunoblot. R&D 8908-SH served as a size control for full-length dual-lipidated Shh, and C25S;26-35Shh served as a size control for N-terminally truncated monolipidated Shh. Both C25SShh bands are specific: One represents the full-length protein and the bottom band represents N-truncated processed proteins. The blot was first incubated with antibody AF464 and reincubated (after stripping) with the much more sensitive antibody C9C5.

      3) “A stably expressing Shh/Hhat cell line would reduce condition to condition and experiment to experiment variability”.

      We agree and therefore have previously aimed to establish stable Hhat-expressing cell lines. However, we found that long-term Hhat overexpression eliminated transfected cells after several passages, or cells gradually ceased to express Hhat. This prevented us from establishing stable cell lines co-expressing Shh/Hhat despite several attempts and different strategies. Instead, we established transient co-expression of Shh/Hhat from the same mRNA as the next-best strategy for reliable near-quantitative Shh palmitoylation in our assays.

      4) “Unusual normalization strategies are used for many experiments, and quantification/statistical analyses are missing for several experiments”.

      We repeated all qPCR assays to eliminate this shortcoming. Biological activities and transcriptional responses of palmitoylated Shh and non-palmitoylated C25AShh are now directly compared and quantified (revised Fig. 4A,B, newly included Fig. 6, revised Fig. S5B). The original comparison of both proteins with dual-lipidated R&D 8908-SH is still important in order to show that both Shh and C25AShh in serum-containing media have equally high, and not equally low, activities because R&D 8908-SH is generally seen as the Shh form with the highest biological activity. These comparisons are therefore still discussed in the main manuscript text and are now shown in Fig. S5E.

      5) “The study provides a modest advance in the understanding of the complex issue of Shh membrane extraction”

      We believe that the revised manuscript advances our understanding of Shh membrane extraction beyond the modest in three important ways. First, although Disp was indeed known as a furin-activated Hh exporter, our findings show for the first time that furin activation of Disp is strictly linked to proteolytic Shh processing as the underlying release mode, fully consistent with data obtained from the Disp-/- cells.

      Second, Scube2 was known as a Shh release enhancer and several lipoproteins were previously shown to play a role in the process, but our findings are the first to show that synergistic Disp/Scube2 function depends on the presence of lipoprotein and that HDL (but no other lipoprotein) accepts free cholesterol or a novel monolipidated Shh variant from Disp. This challenges the dominant model of Scube2 chaperone function in Hh release and transport (PMID 22902404, PMID 22677548, PMID 36932157).

      Third, we show that this Shh variant is fully bioactive, despite the lack of the palmitate. Therefore, N-palmitate is dispensable for Shh signaling to Ptch1 receptors, but only if the morphogen is released by, and physically linked to, HDL. In contrast, previously published studies analyzed monolipidated Shh variants in the absence of HDL, resulting in variably reduced bioactivity of these physiologically irrelevant forms. Therefore, our findings challenge the current dominating model of N-palmitate-dependent Shh signaling to Ptch1 (this model also does not postulate any role for lipoproteins, PMID 36932157) and essential roles of N-palmitate (stating that the N-palmitate is sufficient for signaling, PMID 27647915).

      Reviewer 2 (public):

      1) “However, the results concerning the roles of lipoproteins and Shh lipid modifications are largely confirmatory of previous results, and molecular identity/physiological relevance of the newly identified Shh variant remain unclear”.

      We disagree with this assessment on several points. First, our findings do not confirm, but strongly challenge, the current dogma of Disp-mediated handover of dual-lipidated Shh to Scube2 as a soluble acceptor (instead of to HDL, PMID 36932157). Second, we report three new findings: Disp, Scube2, and lipoproteins all interact to specifically increase N-terminal Shh shedding, whereas C-terminal shedding is optional; Disp function depends on the presence of HDL; and HDL modulates Shh shedding (dual Shh shedding in the absence of HDL versus N-shedding and HDL association in its presence). Our work also directly determines the molecular identity of a previously unknown Shh variant as monolipidated (by RP-HPLC), HDL associated (by SEC and density gradient centrifugation), and fully bioactive (in two cell-based reporter assays).

      Third, regarding the physiological relevance of our findings: Fig. S8 demonstrates that deletion of the N-terminal sheddase target site of Hh abolishes all Hh biofunction in Drosophila eye discs and wing discs, which strongly supports physiological relevance of N-terminal Hh shedding during release. N-terminal shedding is further consistent with in vivo findings of others. These studies showed that artificial monolipidated Shh variants (C25SShh and ShhN) generate highly variable loss-of-function phenotypes in vivo, but can also generate gain-of-function phenotypes if compared with the dual-lipidated cellular protein 1, 2, 3, 4, 5. These observations are difficult to align with the dominating model of essential N-palmitate function at the level of Ptch1 (PMID 36932157), because the lack of N-palmitate is expected to always diminish signaling in all tissue contexts and developmental stages. Our finding that dual-lipidated Shh is strictly released in a Disp/Scube2-controlled manner from producing cells, while artificial monolipidated Shh variants leak uncontrolled from the cellular surface, explains these seemingly paradoxical in vivo findings much better. This is because uncontrolled Shh release can increase Shh signaling locally (when physiological release would normally be prevented at this site 6 or time), while it can also decrease it (for example, in situations requiring timed pulses of Shh release and signaling 7, 8, 9, 10, 11). This is discussed in our manuscript (Discussion, first paragraph).

      2) The molecular properties of the processed Shh variants are unclear – incorporation of cholesterol/palmitate and removal of peptides were not directly demonstrated…

      We also disagree on this point. Our study is the only one that uses RP-HPLC and defined controls (dual-lipidated commercial R&D 9808-SH, dual-lipidated cellular proteins eluting at the same positions, non-lipidated or monolipidated controls, Fig. S1F-K) to compare the lipidation status of cellular and corresponding solubilized Shh and to determine their exact lipidation status (Figs. 1, 3, 5, Figs. S4, S6, S7). Co-expressed Hhat assures full Shh palmitoylation during biosynthesis (as shown in original Figs. 1A and S2F-K & S4A and as confirmed by R&D 9808-SH) as an essential prerequisite to reliably conduct and interpret these analyses. The removal of peptides is demonstrated by the increase in electrophoretic mobility of soluble forms, if compared with their dual-lipidated cellular precursor, because chemical delipidation results in a decrease in electrophoretic mobility in SDS-PAGE (as discussed in detail in 12 that we now cite in our work).

      3) This (N-terminal palmitoylation status) is particularly relevant …, as the signaling activity of non-palmitoylated Hedgehog proteins is controversial.

      We agree with this comment and are aware of the published data. However, in our work, we have demonstrated strong signaling activities by using C25AShh mutants that are fully impaired in their ability to undergo N-palmitoylation (Fig. 4, Fig. S5). These are highly bioactive if associated with HDL. Therefore, we do not see any ambiguity in our findings and suggest that the reports of others resulted from different experimental conditions.

      4) A decrease in hydrophobicity is no proof for cleavage of palmitate, this could also be due to addition of a shorter acyl group.

      As shown in the original manuscript, we have controlled for this possibility: RP-HPLC was established by using defined controls (dual-lipidated, non-lipidated, or monolipidated, Fig. S1F-K and corresponding color coding). Because the cellular Shh precursor prior to release was always dual-lipidated, whereas the soluble form was not, lipids were clearly lost during release (because a decrease in the hydrophobicity of soluble proteins is always shown relative to that in their dual-lipidated cellular precursors). The increase in electrophoretic mobility detected for the very same proteins in SDS-PAGE demonstrates delipidation during their release (please see my reply to point 2 above). Finally, the suggested possibility of palmitate exchange for shorter acyls during Shh release at the cell surface is extremely unlikely, as there is no known machinery to catalyze this exchange at the plasma membrane. Hh acylation only occurs in the ER membrane via Hhat 13.

      5) “It would be important to demonstrate key findings in cells that secrete Shh endogenously”.

      We now show that Panc1 cells release endogenous Shh in truncated form, as our transfected cells do (Fig. S1). Moreover, the experimental data shown in Fig. S8B demonstrate that engrailed-controlled expression of sheddase-resistant Hh variants in wing disc cells completely blocks endogenous Hh produced in the same cells by stalling Disp-mediated morphogen export. Both findings strongly support our key finding that N-processing is not optional but absolutely required to finalize Hh release.

      6) Co-fractionation of Shh and ApoA1 is not convincing, as the two proteins peak at different molecular weights…. The authors could use an orthogonal approach, optimally a demonstration of physical interaction, or at least fractionation by a different parameter

      Shifted Shh peaks upon physiologically relevant Shh transfer via Disp to HDL must be expected in SEC, because Shh association with HDL subfractions increases their size. Comparing relative peaks of Shh-loaded HDL with Shh-free reference HDL suggests 10-15 Shh molecules per HDL (adding 200kDa - 300kDa to its molecular mass). This is now stated in the revised manuscript (page 10, line 2).

      Still, to further support direct Shh/HDL association, we analyzed high molecular weight Shh SEC fractions by subsequent RP-HPLC. This approach confirms direct physical interactions between cholesteroylated Shh and HDL (now shown in Fig. S6G).

      We support this possibility further by density gradient centrifugation, again demonstrating that Shh and HDL interact physically (now shown in Fig. S6 E,F).

      Recommendations from the reviewing editor:

      1) “The authors should certainly tone down statements of novelty because much of the work is confirmatory in nature”

      We followed this request in our revised manuscript and now clearly point out what was known and what we add to the concept of Disp and lipoprotein-mediated Hh export. Still, as outlined in our response to reviewer 2, our findings align with only one previously published model of lipoprotein-mediated Hh transport, while they do not support the most current models of Disp-mediated handover of dual-lipidated Shh to Scube2 (PMID 36932157) and essential signaling roles of N-palmitate at the level of the receptor Ptch1. Thus, our work should not be viewed solely as confirmatory of one of the many previous models, because at the same time it also contradicts the other models of Hh solubilization and transport.

      2) “Inclusion of the Shh(-) control”

      Please see our reply to reviewer 1 above. The Cell Signaling Technology C9C5 anti-Shh antibody used in our study is highly specific against Shh. We also carefully characterized the C9C5 antibody before any of the experiments shown in our work had been initiated. We never observed any unspecific C9C5 reactivity that otherwise would – of course – have prevented us from switching to this antibody from the AF464 antibodies that we had previously used. Consistent C9C5 antibody specificity is evident from the representative example shown below that was recently produced in our lab: no cellular proteins or TCA-precipitated serum-depleted media components from mock-transfected cells (left two lanes) react with C9C5.

      Author response image 2.

      Top left: C9C5 detects the cellular 45kDa Shh precursor and the 19 kDa signaling-active protein. No unspecific signals are detected in untransfected cells and supernatants of such cells (left two lanes). Right: Loading control on the stripped blot.

      3) “Clean up how the data are normalized for quantification”

      Please see our reply to reviewer 1 above. Normalization has been changed for the indicated figures. We also repeated qPCR analyses and added new ones to the manuscript that include required controls. We also changed figure outlines in accordance with the request.

      4) “The issue of a non-specific band of this Shh antibody is critical”

      Please see our replies above. In our hands, unspecific C9C5 antibody binding was never observed.

      5) “Regarding experimental rigor, I would add that the HPLC … should just show the real data points”

      We agree and added individual data points to our revised manuscript.

      Recommendations for the authors:

      1) I would like to see the controls in the same figure with the experimental results.

      We show antibody specificity controls together with released Shh in Fig. S1.

      2) Figure 2 confirms previously published results. It was shown in PMC5811216 that Disp processing by furin is required for Shh release from producing cells.

      Indeed, it was shown that furin processing of Disp increases Shh release (supposedly together with lipids), but we show here that furin-activated Disp specifically mediates proteolytic Shh shedding and loss of lipids – which is not the same. Indeed, we show this finding because we interpret it the other way around: Because it is known that furin activation of Disp increases Shh release by some means (PMC5811216), our observation that furin-mediated Disp activation specifically increases Shh shedding independently supports our model.

      3) Figure 3: it is stated that there is no increase in Shh release into the media…

      We removed this statement.

      4) Figure S5: Scale bars are missing.

      We added scale bars to the figures.

      5) Figure 4: A direct comparison between wt Shh and C25A conditioned media for qPCR is needed.

      We agree and repeated all experiments. Results confirm our previous findings and are shown in revised Fig. 4 and in Fig. S5.

      6) What other components can be examined in addition to ApoA1 as a marker for HDL? Why is the Shh peak shifted to the left? What about exovesicles?

      We also detected ApoE4, a mobile lipoprotein present on expanding (large) HDL (Figs. 5, 6, Figs S6, 7) 14. We also used density gradient centrifugation to support the Shh/HDL association. Regarding the leftwards Shh size shift relative to the major HDL peak in SEC, please refer to our explanation above – if loaded with Shh, a size increase of the respective HDL subfraction is expected. Finally, we did not test the role of exovesicles in our assays. However, due to their large size (60-120nm, HDL 7-12 nm), Shh associated with exovesicles should have eluted in the void volume of our gel filtration column. This we never observed.

      7) Why is osteoblast differentiation used?

      C3H10T1/2 osteoblast differentiation is strongly driven by Ihh and Shh activity and is established as a sensitive and robust assay. Still, following this reviewer’s advice, we conducted qPCR assays on these cells and in addition on NIH3T3 cells to support our findings.

      Finally, we corrected all minor mistakes regarding spelling and figure labeling. We also improved the readability of the revised manuscript, as suggested by reviewer 2.

      References

      1. Gallet A, Ruel L, Staccini-Lavenant L, Therond PP. Cholesterol modification is necessary for controlled planar long-range activity of Hedgehog in Drosophila epithelia. Development 133, 407-418 (2006).

      2. Porter JA, et al. Hedgehog patterning activity: role of a lipophilic modification mediated by the carboxy-terminal autoprocessing domain. Cell 86, 21-34 (1996).

      3. Lewis PM, et al. Cholesterol modification of sonic hedgehog is required for long-range signaling activity and effective modulation of signaling by Ptc1. Cell 105, 599-612 (2001).

      4. Huang X, Litingtung Y, Chiang C. Region-specific requirement for cholesterol modification of sonic hedgehog in patterning the telencephalon and spinal cord. Development 134, 2095-2105 (2007).

      5. Lee JD, et al. An acylatable residue of Hedgehog is differentially required in Drosophila and mouse limb development. Dev Biol 233, 122-136 (2001).

      6. Corrales JD, Rocco GL, Blaess S, Guo Q, Joyner AL. Spatial pattern of sonic hedgehog signaling through Gli genes during cerebellum development. Development 131, 5581-5590 (2004).

      7. Cordero D, Marcucio R, Hu D, Gaffield W, Tapadia M, Helms JA. Temporal perturbations in sonic hedgehog signaling elicit the spectrum of holoprosencephaly phenotypes. J Clin Invest 114, 485-494 (2004).

      8. Dessaud E, et al. Interpretation of the sonic hedgehog morphogen gradient by a temporal adaptation mechanism. Nature 450, 717-720 (2007).

      9. Garcia-Morales D, Navarro T, Iannini A, Pereira PS, Miguez DG, Casares F. Dynamic Hh signalling can generate temporal information during tissue patterning. Development 146, (2019).

      10. Harfe BD, Scherz PJ, Nissim S, Tian H, McMahon AP, Tabin CJ. Evidence for an expansion-based temporal Shh gradient in specifying vertebrate digit identities. Cell 118, 517-528 (2004).

      11. Nahmad M, Stathopoulos A. Dynamic interpretation of hedgehog signaling in the Drosophila wing disc. PLoS Biol 7, e1000202 (2009).

      12. Ehring K, et al. Conserved cholesterol-related activities of Dispatched 1 drive Sonic hedgehog shedding from the cell membrane. J Cell Sci 135, (2022).

      13. Coupland CE, et al. Structure, mechanism, and inhibition of Hedgehog acyltransferase. Mol Cell 81, 5025-5038 e5010 (2021).

      14. Sacks FM, Jensen MK. From High-Density Lipoprotein Cholesterol to Measurements of Function: Prospects for the Development of Tests for High-Density Lipoprotein Functionality in Cardiovascular Disease. Arterioscler Thromb Vasc Biol 38, 487-499 (2018).

    1. Author Response

      The following is the authors’ response to the previous reviews

      The revised manuscript is much improved - many unclear points are now better explained. However, in our opinion, some issues could still be significantly improved.

      1. Statistics: none of us are experts in statistics but several things remain questionable in our opinion and if it were our study, we would consult with an expert:

      a) while we understand the authors note about N-chasing and p-hacking, we wonder how the number of N's was premeditated before obtaining the results. Why in 4M an N of 3 is sufficient while in 3E the N is >20 (and not mentioned). At the very least, we think it would be wise to be cautious when stating something as not-significant when it is clear (as in 4M) that the likelihood of it actually being statistically significant is quite large.

      b) In most analyses, the data is not only normalized by actin or some other measure but also to the first (i.e left side on the graph) condition, resulting in identical data points that equal '1' (in Figure 4 alone - C; I; K; M; and O) - while this might be scientifically sound, it should be mentioned (the specific normalization) and also note that this technique shadows any real variance that exists in the original data in this condition. consider exploring techniques to overcome this issue.

      c) In 3C, - if we understand the experiment, you want to convince us that the DIFFERENCE between eB2-FC compared to FC is larger in the control compared to the experiment. We are not absolutely sure that the statistical tools employed here are sufficient - which is why we would consult an expert.

      A) We are aware that many studies do not consistently quantify such experiments. For example, there are essentially no published examples of the signalling timelines of EphB2 receptors as in Fig. 5. By striving to quantifying such biochemical effects, an unquantified experiment stands out, and so perhaps we were too strict by trying to quantify as many experiments as possible, resulting in low n’s for some of them. We acknowledge that additional experiments on EPHB1 protein stability may reach significance. We have adjusted our text on line 332-335 to point to this interesting trend, and slightly changed the conclusion to this section. Similarly, we commented on similar trends when describing Figs. 1E and 4G on lines 901 and 952.

      B) For the Western blot band intensity normalisation, we believe that our method is scientifically sound. Normally, when the replicate samples are loaded on one gel and blotted on the same membrane, the experimenter only needs to normalise the target band intensity to its cognate loading control band intensity for quantitation. However, we usually have a large number of samples from multiple experiments, carried out on different dates. For example, in Fig. 4B,C there are 7 biological replicates collected from 7 experiments and in Fig. 4D there are 10 protein samples. It is not possible for us to run all samples on the same gel. In addition, due to the combined effects of variance in transfer efficiency, the potency of antibodies, detection efficiency and the developing time for each blot, it is practically impossible to generate similar band intensity for each batch. Thus, we use normalisation of test bands to the loading control for individual experiments, and this analysis method is widely accepted by reputable journals with a focus on biochemical experiments (for example: PMID 37695914: Fig. 3 A,B,C; PMID 36282215: Fig. 3 B,C,D,E; PMID 33843588: Fig. 3 C,D,E,F,G,H). Since the value of the first sample on the plot is 1, which is a hypothetical value and does not meet the parametric test requirement, we performed one-sample t-test for statistics when other samples are compared with the first sample (PMID 35243233 Fig. 6 A,B,C,D; https://www.graphpad.com/quickcalcs/oneSampleT1/, “A one sample t-test compares the mean with a hypothetical value. In most cases, the hypothetical value comes from theory. For example, if you express your data as 'percent of control', you can test whether the average differs significantly from 100.”). Thus, we believe that our normalisation and statistical methods are both correct with a large number of precedents.

      C) This comment refers to the cell collapse experiment shown in Fig. 3C for which the data are plotted in Fig. 3D. We stand by the statistical method used. There are two groups of cells (CTRLCRISPR and MYCBP2 CRISPR) and two treatments for each cell group (Fc control and eB2), thus we should use two-way ANOVA. Since we compared the cell retraction effects of Fc and eB2 on the two groups of cells, Sidak post hoc comparison is the right method to avoid errors introduced by multiple comparisons. Here is an example of an eLife article that used the same statistical method for similar comparisons: PMID 37830910, Fig. 1 H,I. To make the comparison easier, we grouped the experiments by cell type (CTRLCRISPR and MYCBP2 CRISPR) as opposed to by treatment. Below, the old version is on the right, and the new version is on the left. The conclusion is that eB2 induces less cell collapse in cells depleted of MYCBP2, when compared to the control cells. However, eB2 is still able to collapse cells lacking MYCBP2.

      Author response image 1.

      Revisiting these data, we noticed an error introduced when CC compiled the data used to generate Fig. 3D. The data were acquired from nine biological replicates per condition. CC used a mix of two methods for cell collapse rate calculation: the first method involved the sum of collapsed cells and all cells from multiple regions of one coverslip (biological replicate). The second method involved computing a collapse rate in each region which then was used to calculate the average collapse rate for the entire coverslip (technical replicate). Given the small cell numbers due to sparse culture conditions, we believe that the first method is a more conservative approach. We hence re-plotted all replicate data using the first method. This resulted in slightly different % collapse and p values. These were changed accordingly in the text and plot and do not affect the conclusion of this experiment.

      2) thanks for the clarification that the interaction between the extracellular domain of EPHB2 and MYCBP2 might not occur directly - however, unless we missed this it was not clearly stated in the text. It is an important point and also a cool direction for the future - to find the elusive co-receptor that actually helps EPHB2 and MYCBP2 form a complex.

      We now also refer to this in the results section on line 215.

      “Since EPHB2 is a transmembrane protein and MYCBP2 is localised in the cytosol, these experiments suggest that the interaction between the extracellular domain of EPHB2 and MYCBP2 might be indirect and mediated by other unknown transmembrane proteins.”

      3) The Hela CRISPR cell line is better explained in the response letter but still not sufficiently explained in the text for a non-expert reader. If the authors want any reader to comprehend this, we would strongly recommend adding a scheme.

      We now include a schematic outlining the CRISPR cell generation as Fig. 3A and its description on line 926.

      Author response image 2.

      4) To clarify some of our previous (and persisting) concerns about Figure 3D/E - it is true that a reduction in 25% of cell size is dramatic. But (if we understand correctly) your claim is that a reduction in 22% (this is a guess, as the actual numbers are not supplies) is significantly less than 25%. Even if it is, statistically speaking, significant, what is the physiological relevance of this very slight effect? In this experiment, the N was quite large, and we wonder if the images in D are representative - it would be nice to label the data points in E to highlight which images you used.

      We now mention the average cell area contraction measurements in the legend to Fig. 3F on line 935. We also tracked down the individual cells shown in Fig. 3E and they are now labelled as data points in blue in Fig. 3F. HeLa cell collapse is a simplified model of EPHB2 function and we do not know whether the difference between the behaviour of CTRLCRISPR and MYCBP2 CRISPR cells is physiologically significant and thus we prefer not to speculate on this.

      5) Figure 3F and other stripe assays - In the end, it is your choice how to quantify. We believe that quantifying area of overlap is a more informative and objective measurement that might actually benefit your analyses. That said, if you do keep the quantification as it is now, you have to define the threshold of what you mean by "cell/s (or an axon in 7A, where it is even more complicated as are you eluding to primary, secondary, or even smaller branches) are RESIDING within the stripe". Is 1% overlap sufficient or do you need 10 or 50% overlap?

      We now added this statement to the methods on line 745: “A cell was considered to be on an ephrin-B2 stripe when more than 50% of its nucleus was located on that stripe”. For chick explant stripe assay, when measuring the length of an axon on a stripe, we only measured the main axons originated from the explants.

      For explant/stripe experiments in Fig. 7 AB, we now use the term “GFP-expressing neurite” rather than “branch”. This was already present in the results of the previous version, but the methods and legend needed to be brought up to date (lines 786 and 1008. We think that “branch” was a confusing term that was supposed to mean the same thing as “neurite” but came across as some indication of branching. We do not know whether the GFP+ neurites were primary or secondary extensions of explants, or in fact, whether some of them contained more than one axon. We also adjusted the method to reflect the fact that some stripes were used in conjunction with a single explant and added a reference to a previous study extensively using this method (Poliak et al., 2015) on line 778.

      6) We still don't get the link to the lysosomal degradation. Your data suggests that in your cells EPHB2 is primarily degraded by the lysosomal pathway and not proteasome. Any statement about MYCBP2 is not strongly supported by the data, in our opinion - Unless you develop some statistical measurement that shows that the effect of BafA1 is statistically different in MYCBP2 cells than in control cells. Currently, this is not the case and the link is therefore not warranted in our opinion.

      We generated a new version of Fig. 4K with average increase in EPHB2 levels in the presence of BafA1 and CoQ, compared to DMSO treated controls (see below). BafA1 and CoQ restored EPHB2 protein levels by 19% and 14% respectively in CtrlCRISPR cells, while the inhibitors restored EPHB2 protein levels by 40% and 35% respectively in MYCBP2 CRISPR cells.

      Author response image 3.

      For each of the 4 replicates, the increase in EPHB2 levels by BafA1 compared to DMSO is as follows:

      Author response table 1.

      These values are not significantly different between CtrlCRISPR cells versus MYCBP2 CRISPR cells (p= 0.08, student’s t test). Similarly for the CoQ experiment. We now temper our conclusion for this experiment: Although the difference in percentage increase between CTRLCRISPR cells and MYCBP2CRISPR cells is not significant, this trend raises the possibility that the loss of MYCBP2 promotes EPHB2 receptor degradation through the lysosomal pathway (line 319). We also adjusted the section title (line 306).

      7) While the C. elegans part is now MUCH better explained - we are not sure we understand the additional insight. The fact that vab-1 and glo4 double mutants are additive as are vab1 and fsn1, suggest they act in parallel (if the mutants are NULL, and not if they are hypomorphs, if one wants to be accurate) - how this relates to your story is unclear. The vab1/rpm1 double mutant is still uninformative and incomplete. rpm1 phenotype is so severe that nothing would make it more severe. We read the Jin paper that the authors directed to - nothing makes the rpm1 phenotype more severe. Yes, some DOWNSTREAM elements make the rpm1 phenotype LESS severe - this is not something you were testing, to the best of our knowledge. Rather, you wanted to see if rpm1 mutant resulted in stabilization of vab1 and thus suppression of vab1 phenotype - we are just not sure the system is amenable to test (actually reject) your hypothesis that Vab1 is degraded by rpm1. Also, assuming we are talking about NULLs, the fact that the rpm1 phenotype is WAY stronger than the vab1 mutant, suggests that rpm1 functions via multiple routes, adding even more complexity to the system. Given these results, despite the much improved clarity, we are still not sure that the worm data adds new insight, rather than potentially confusing the reader.

      We realise that the genetic interactions between vab-1 and the RPM-1/MYCBP2 signalling network are complicated. However, we insist on keeping the data for the sake of its availability for future studies and completeness. We also think it is important for readers and the community to see these data, even if the authors and reviewers are not entirely in agreement about the importance/interpretation of experimental outcomes. It is our hope that the community will examine the results and draw their own conclusions.

      A few points of clarification:

      The C. elegans experiments were designed to test genetically if the vertebrate interactions between EPHB2 and MYCBP2 and its signalling network are conserved. We studied two kinds of interactions: (1) between vab-1 and RPM-1/MYCBP2 downstream proteins (GLO-4 and FSN-1) and (2) between vab-1 and rpm-1. For these studies, we used null alleles for vab-1, glo-4 and fsn-1 which is now noted on lines 440, 453, 475 and 859. Our findings are consistent with the VAB-1 Ephrin receptor functioning in parallel to known RPM-1 binding proteins. This is further supported by new data: vab-1; fsn-1 double mutants showed enhanced incidence of axon overextension defects using a second transgenic background, zdIs5 (Pmec-4::GFP), to visualize axon termination (Fig. 8F).

      This second transgenic background also allowed us to generate new data to address your concerns about phenotypic saturation in rpm-1 mutants. To do this, we used the zdIs5 (Pmec4::GFP) genetic background, in which axon termination defects are not saturated in rpm-1 mutants (Fig. 8F) because they can be enhanced by other mutants such as cdc-42 and unc-33 (Fig. 7C, D, in Borgen et al. Development 144, 4658–4672 (2017), PMID 29084805). In this new background, we found that vab-1 loss of function fails to enhance the incidence of severe “hook” defects in rpm-1 mutants which is an indication that the two genes function in the same pathway. Importantly, prior studies in this background, also showed that mutants in the RPM-1 signalling network (e.g. fsn-1, glo-4 and ppm-2) do not enhance the incidence of severe “hook” defects as double mutants with rpm-1 compared to rpm-1 single mutants (Fig. 7B, ibid.).

      To reflect these ideas more clearly, we revised the Results section pertaining to C. elegans genetics (starting on line 418) and tempered our discussion (lines 517). Basically, this section now says that we studied genetic interactions between vab-1 and the RPM-1/MYCBP2 signalling network. From these experiments we conclude that: (1) The enhancement of overextension defects in vab-1; glo-4 and vab-1; fsn-1 double mutants compared to single mutants indicates that VAB-1/EPHR functions in parallel to known RPM-1 binding proteins to facilitate axon termination, and (2) Since the vab-1; rpm-1 double mutants do not display an increased frequency or severity of overextension defects compared to rpm-1 single mutants, VAB-1 /EPHR functions in the same genetic pathway as RPM-1/MYCBP2.

      The new genetic data included in this version were generated by Karla J. Opperman who is now included as a co-author.

      Further corrections:

      Author response image 4.

      Because of the errors associated with quantifications in Fig. 3D (see above), we reviewed other quantification methodologies and noticed another discrepancy that required a correction. In the hippocampal neuron growth cone collapse assay shown in the previous version of Fig. 7 D (left), the growth cones were classified into three groups: 1, fully collapsed; 2, hard to tell, but not fully collapsed; 3, fan-shape cones. Two different quantifications were performed as follows: (1), number of fully collapsed cones divided by the numbers of all growth cones; (2), number of fully collapsed cones divided by [number of fully collapsed cones + fan-shape cones]. CC erroneously used the second method to generate Fig. 7D.

      We think that the first method is more appropriate. Furthermore, since n=5 for the Fc and eB1-Fc conditions, but n=3 for the eB2-Fc condition, we decided to omit it. The final plot for figure 7D is the following:

      Author response image 5.

      Our conclusion still stands that exogenous FBD1 WT overexpression impaired the growth cone collapse mediated by EphB.

    1. Author Response

      Response to Reviewer 1:

      Summary of what the author was trying to achieve: In this study, the author aimed to develop a method for estimating neuronal-type connectivity from transcriptomic gene expression data, specifically from mouse retinal neurons. They sought to develop an interpretable model that could be used to characterize the underlying genetic mechanisms of circuit assembly and connectivity.

      Strengths: The proposed bilinear model draws inspiration from commonly implemented recommendation systems in the field of machine learning. The author presents the model clearly and addresses critical statistical limitations that may weaken the validity of the model such as multicollinearity and outliers. The author presents two formulations of the model for separate scenarios in which varying levels of data resolution are available. The author effectively references key work in the field when establishing assumptions that affect the underlying model and subsequent results. For example, correspondence between gene expression cell types and connectivity cell types from different references are clearly outlined in Tables 1-3. The model training and validation are sufficient and yield a relatively high correlation with the ground truth connectivity matrix. Seemingly valid biological assumptions are made throughout, however, some assumptions may reduce resolution (such as averaging over cell types), thus missing potentially important single-cell gene expression interactions.

      Thank you for acknowledging the strengths of this work. The assumption to average gene expression data across individual cells within a given cell type was made in response to the inherent limitations of, for example, the mouse retina dataset, where individual cell-level connectivity and gene expression data are not profiled jointly (the second scenario in our paper). This approach was a necessary compromise to facilitate the analysis at the cell type level. However, in datasets where individual cell-level connectivity and gene expression data are matched, such as the C.elegans dataset referenced below, our model can be applied to achieve single-cell resolution (the first scenario in our paper), offering a more detailed understanding of genetic underpinnings in neuronal connectivity.

      Weaknesses: The main results of the study could benefit from replication in another dataset beyond mouse retinal neurons, to validate the proposed method. Dimensionality reduction significantly reduces the resolution of the model and the PCA methodology employed is largely non-deterministic. This may reduce the resolution and reproducibility of the model. It may be worth exploring how the PCA methodology of the model may affect results when replicating. Figure 5, ’Gene signatures associated with the two latent dimensions’, lacks some readability and related results could be outlined more clearly in the results section. There should be more discussion on weaknesses of the results e.g. quantification of what connectivity motifs were not captured and what gene signatures might have been missed.

      I value the suggestion of validating the propose method in another dataset. In response, I found the C.elegans dataset in the references the reviewer suggested below a good candidate for this purpose, and I plan to explore this dataset and incorporate findings in the revised manuscript. I understand the concerns regarding the PCA methodology and its potential impact on the model’s resolution and reproducibility. In response, alternative methods, such as regularization techniques, will be explored to address these issues. Additionally, I agree that enhancing the clarity and readability of Figure 5, as well as including a more comprehensive discussion of the model’s limitations, would significantly strengthen the manuscript.

      The main weakness is the lack of comparison against other similar methods, e.g. methods presented in Barabási, Dániel L., and Albert-László Barabási. "A genetic model of the connectome." Neuron 105.3 (2020): 435-445. Kovács, István A., Dániel L. Barabási, and Albert-László Barabási. "Uncovering the genetic blueprint of the C. elegans nervous system." Proceedings of the National Academy of Sciences 117.52 (2020): 33570-33577. Taylor, Seth R., et al. "Molecular topography of an entire nervous system." Cell 184.16 (2021): 4329-4347.

      Thank you for highlighting the importance of comparing our model with others, particularly those mentioned in your comments. After reviewing these papers, I find that our bilinear model aligns closely with the methods described, especially in [1, 2]. To see this, let’s start with Equation 1 in Kovács et al. [2]:

      In this equation, B represents the connectivity matrix, while X denotes the gene expression patterns of individual neurons in C.elegans. The operator O is the genetic rule operator governing synapse formation, linking connectivity with individual neuronal expression patterns. It’s noteworthy that the work of Barabási and Barabási [1] explores a specific application of this framework, focusing on O for B that represents biclique motifs in the C.elegans neural network.

      To identify the the operator O, the authors sought to minimize the squared residual error:

      with regularization on O.

      Adopting the notation from our bilinear model paper and using Z to represent the connectivity matrix, the above becomes

      Coming back to the bilinear model formulation, the optimization problem, as formulated for the C.elegans dataset where individual neuron connectivity and gene expression are accessible, takes the form:

      where we consider each neuron as a distinct neuronal type. In addition, we extend the dimensions of X and Y to encompass the entire set of neurons in C.elegans, with X = Y ∈ Rn×p, where n signifies the total number of neurons and p the number of genes. Accordingly, our optimization challenge evolves into:

      Upon comparison with the earlier stated equation, it becomes clear that our approach aligns consistently with the notion of O = ABT. This effectively results in a decomposition of the genetic rule operator O. This decomposition extends beyond mere mathematical convenience, offering several substantial benefits reminiscent of those seen in the collaborative filtering of recommendation systems:

      • Computational Efficiency: The primary advantage of this approach is its improvement in computational efficiency. For instance, solving for O ∈ Rp×p necessitates determining p2 entries. In contrast, solving for A ∈ Rp×d and B ∈ Rp×d involves determining only 2pd entries, where p is the number of genes, and d is the number of latent dimensions. Assuming the existence of a lower-dimensional latent space (d << p) that captures the essential variability in connectivity, resolving A and B becomes markedly more efficient than resolving O. Additionally, from a computational system design perspective, inferring the connectivity of a neuron allows for caching the latent embeddings of presynaptic neurons XA or postsynaptic neurons XB with a space complexity of O(nd). This is significantly more space-efficient than caching XO or OXT, which has a space complexity of O(np). This difference is particularly notable when dealing with large numbers of neurons, such as those in the entire mouse brain. The bilinear modeling approach thus enables effective handling of large datasets, simplifying the optimization problem and reducing computational load, thereby making the model more scalable and faster to execute.

      • Interpretability: The separation into A for presynaptic features and B for postsynaptic features provides a clearer understanding of the distinct roles of pre- and post- synaptic neurons in forming the connection. By projecting the pre- and post- synaptic neurons into a shared latent space through XA and YB, one can identify meaningful representations within each axis, as exemplified in different motifs from the mouse retina dataset. The linear characteristics of A and B facilitate direct evaluation of each gene’s contribution to a latent dimension. This interpretability, offering insights into the genetic factors influencing synaptic connections, is beyond what O could provide itself.

      • Flexibility and Adaptability: The bilinear model’s adaptability is another strength. Much like collaborative filtering, which can manage very different user and item features, our bilinear model can be tailored to synaptic partners with genetic data from varied sources. A potential application of this model is in deciphering the genetic correlates of long-range projectomic rules, where pre- and post-synaptic neurons are processed and sequenced separately, or even involving post-synaptic targets being brain regions with genetic information acquired through bulk sequencing. This level of flexibility also allows for model adjustments or extensions to incorporate other biological factors, such as proteomics, thereby broadening its utility across various research inquiries into the determinants of neuronal connectivity.

      In the study by Taylor et al. [3], the authors introduced a generalization of differential gene expressions (DGE) analysis called network DGE (nDGE) to identify genetic determinants of synaptic connections. It focuses on genes co-expressed across pairs of neurons connected, compared with pairs without connection.

      As the authors acknowledged in the method part of the paper, nDGE can only examine single genes co-expressed at synaptic terminals: "While the nDGE technique introduced here is a generalization of standard DGE, interrogating the contribution of pairs of genes in the formation and maintenance of synapses between pairs of neurons, nDGE can only account for a single co-expressed gene in either of the two synaptic terminals (pre/post)."

      In contrast, the bilinear model offers a more comprehensive analysis by seeking a linear combination of gene expressions in both pre- and post-synaptic neurons. This model goes beyond the scope of examining individual co-expressed genes, as it incorporates different weights for the gene expressions of pre- and post-synaptic neurons. This feature of the bilinear model enables it to capture not only homogeneous but also complex and heterogeneous genetic interactions that are pivotal in synaptic connectivity. This highlights the bilinear model’s capability to delve into the intricate interactions of synaptic gene expression.

      Appraisal of whether the author achieved their aims, and whether results support their conclusions: The author achieved their aims by recapitulating key connectivity motifs from single-cell gene expression data in the mouse retina. Furthermore, the model setup allowed for insight into gene signatures and interactions, however could have benefited from a deeper evaluation of the accuracy of these signatures. The author claims the method sets a new benchmark for single-cell transcriptomic analysis of synaptic connections. This should be more rigorously proven. (I’m not sure I can speak on the novelty of the method)

      I value your appraisal. In response, additional validation of the bilinear model on a second dataset will be undertaken.

      Discussion of the likely impact of the work on the field, and the utility of methods and data to the community : This study provides an understandable bilinear model for decoding the genetic programming of neuronal type connectivity. The proposed model leaves the door open for further testing and comparison with alternative linear and/or non-linear models, such as neural networkbased models. In addition to more complex models, this model can be built on to include higher resolution data such as more gene expression dimensions, different types of connectivity measures, and additional omics data.

      Thank you for your positive assessment of the potential impact of the study.

      Response to Reviewer 2:

      Summary: In this study, Mu Qiao employs a bilinear modeling approach, commonly utilized in recommendation systems, to explore the intricate neural connections between different pre- and post-synaptic neuronal types. This approach involves projecting single-cell transcriptomic datasets of pre- and post-synaptic neuronal types into a latent space through transformation matrices. Subsequently, the cross-correlation between these projected latent spaces is employed to estimate neuronal connectivity. To facilitate the model training, connectomic data is used to estimate the ground-truth connectivity map. This work introduces a promising model for the exploration of neuronal connectivity and its associated molecular determinants. However, it is important to note that the current model has only been tested with Bipolar Cell and Retinal Ganglion Cell data, and its applicability in more general neuronal connectivity scenarios remains to be demonstrated.

      Strengths: This study introduces a succinct yet promising computational model for investigating connections between neuronal types. The model, while straightforward, effectively integrates singlecell transcriptomic and connectomic data to produce a reasonably accurate connectivity map, particularly within the context of retinal connectivity. Furthermore, it successfully recapitulates connectivity patterns and helps uncover the genetic factors that underlie these connections.

      Thank you for your positive assessment of the paper.

      Weaknesses:

      1. The study lacks experimental validation of the model’s prediction results.

      Thank you for pointing out the importance of experimental validation. I acknowledge that the current version of the study is focused on the development and validation of the computational model, using the datasets presently available to us. Moving forward, I plan to collaborate with experimental neurobiologists. These collaborations are aimed at validating our model’s predictions, including the delta-protocadherins mentioned in the paper. However, considering the extensive time and resources required for conducting and interpreting experimental results, I believe it is more pragmatic to present a comprehensive experimental study, including the design and execution of experiments informed by the model’s predictions, in a separate follow-up paper. I intend to include a paragraph in the discussion of this paper outlining the future direction for experimental validation.

      1. The model’s applicability in other neuronal connectivity settings has not been thoroughly explored.

      I recognize the importance of assessing the model across different neuronal systems. In response to similar feedback from Reviewer 1, I am keen to extend the study to include the C.elegans dataset mentioned earlier. The results from applying our bilinear model to the second dataset will be incorporated into the revised manuscript.

      1. The proposed method relies on the availability of neuronal connectomic data for model training, which may be limited or absent in certain brain connectivity settings.

      The concern regarding the dependency of our model on the availability of connectomic data is valid. While complete connectomes are available for organisms like C.elegans and Drosophila, and efforts are underway to map the connectome of the entire mouse brain, such data may not always be accessible for all research contexts. Recognizing this limitation, part of the ongoing research is to explore ways to adapt our model to the available data, such as projectomic data. Furthermore, our bilinear model is compatible with trans-synaptic virus-based sequencing techniques [4, 5], allowing us to leverage data from these experimental approaches to uncover the genetic underpinnings of neuronal connectivity. These initiatives are crucial steps towards broadening the applicability of our model, ensuring its relevance and usefulness in diverse brain connectivity studies where detailed connectomic data may not be readily available.

      References

      [1] Dániel L. Barabási and Albert-László Barabási. A genetic model of the connectome. Neuron, 105(3):435–445, 2020.

      [2] István A. Kovács, Dániel L. Barabási, and Albert-László Barabási. Uncovering the genetic blueprint of the c. elegans nervous system. Proceedings of the National Academy of Sciences, 117(52):33570–33577, 2020.

      [3] Seth R. Taylor, Gabriel Santpere, Alexis Weinreb, Alec Barrett, Molly B. Reilly, Chuan Xu, Erdem Varol, Panos Oikonomou, Lori Glenwinkel, Rebecca McWhirter, Abigail Poff, Manasa Basavaraju, Ibnul Rafi, Eviatar Yemini, Steven J. Cook, Alexander Abrams, Berta Vidal, Cyril Cros, Saeed Tavazoie, Nenad Sestan, Marc Hammarlund, Oliver Hobert, and David M. 3rd Miller. Molecular topography of an entire nervous system. Cell, 184(16):4329–4347, 2021.

      [4] Nicole Y. Tsai, Fei Wang, Kenichi Toma, Chen Yin, Jun Takatoh, Emily L. Pai, Kongyan Wu, Angela C. Matcham, Luping Yin, Eric J. Dang, Denise K. Marciano, John L. Rubenstein, Fan Wang, Erik M. Ullian, and Xin Duan. Trans-seq maps a selective mammalian retinotectal synapse instructed by nephronectin. Nat Neurosci, 25(5):659–674, May 2022.

      [5] Aixin Zhang, Lei Jin, Shenqin Yao, Makoto Matsuyama, Cindy van Velthoven, Heather Sullivan, Na Sun, Manolis Kellis, Bosiljka Tasic, Ian R. Wickersham, and Xiaoyin Chen. Rabies virusbased barcoded neuroanatomy resolved by single-cell rna and in situ sequencing. bioRxiv, 2023.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a potentially valuable discovery which indicates that activation of the P2RX7 pathway can reduce the lung fibrosis after its establishment by inflammatory damage. If confirmed, the study could clarify the role of specific immune networks in the establishment and progression of lung fibrosis. However, the presented data and analyses are incomplete as they primarily rely on limited pharmacological treatments with modest effect sizes. I hope you will be convinced by the validity of our approaches with the following explanation/information and I remain at your disposal to discuss

      Public Reviews:

      Reviewer #1 (Public Review):

      In this revised preprint the authors investigate whether a presumably allosteric P2RX7 activating compound that they previously discovered reduces fibrosis in a bleomycin mouse model. They chose this particular model as publicly available mRNA data indicate that the P2RX7 pathway is downregulated in idiopathic pulmonary fibrosis patients compared to control individuals. In their revised manuscript, the authors use three proxies of lung damage, Ashcroft score, collagen fibers, and CD140a+ cells, to assess lung damage following the administration of bleomycin. These metrics are significantly reduced on HEI3090 treatment. Additional data implicate specific immune cell infiltrates and cytokines, namely inflammatory macrophages and damped release of IL-17A, as potential mechanistic links between their compound and reduced fibrosis. Finally, the researchers transplant splenocytes from WT, NLRP3-KO, and IL-18-KO mice into animals lacking the P2RX7 receptor to specifically ascertain how the transplanted splenocytes, which are WT for P2RX7 receptor, respond to HEI3090 (a P2RX7 agonist). Based on these results, the authors conclude that HEI3090 enhanced IL-18 production through the P2RX7-NLRP3 inflammasome axis to dampen fibrosis.

      These findings could be interesting to the field, as there are conflicting results as to whether NLRP3 activation contributes to fibrosis and if so, at what stage(s) (e.g., acute damage phase versus progression). The revised manuscript is more convincing in that three orthogonal metrics for lung damage were quantified. However, major weaknesses of the study still include inconsistent and small effect sizes of HEI3090 treatment versus either batch effects from transplanted splenocytes or the effects of different genetic backgrounds. Moreover, the fundamental assumption that HEI3090 acts specifically and functionally through the P2RX7 pathway in this model cannot be directly tested, as the authors now provide results indicating that P2RX7 knockout mice do not establish lung fibrosis on bleomycin treatment.

      I’m particularly concerned by the assumption made by reviewer 1 concerning the fact that P2RX7 knockout mice do not establish lung fibrosis on bleomycin treatment.

      Indeed, what we showed in the point-to-point response is that BLM induces fibrosis in both WT and P2RX7 KO mice, but the intensity of the fibrosis is reduced in P2RX7KO mice, panel A. Therefore, as discussed in our first response, our results confirmed the previous publication of Riteau et al, that P2RX7 participates in BLM-induced lung fibrosis (see panel B).

      Author response image 1.

      Bleomycin induced lung fibrosis in WT versus p2rx7 KO mice. A: lung from BLM-treated mice were stained with HE and fibrosis was quantified using the Ashcroft protocol. Result showed that fibrosis induced by BLM in KO mice is reduced as compared to WT mice. B: Representative images of lung sections at day 14 after BLM treatment stained with H&E as published in Riteau et al. and illustrating that fibrosis induced by BLM in KO mice is reduced as compared to WT mice. WT mice vehicle (n=4) or p2rx7 KO (n=6) mice. Two-tailed Mann-Whitney test, p values: **p < 0.01.

      Importantly, this lower intensity of lung fibrosis in P2RX7 KO mice, does not interfere with the capacity of our molecule to attenuate lung fibrosis, as demonstrated in the adoptive transfer of IL1B KO splenocytes in P2RX7 KO mice, in which HEI3090 decreases the Ashcroft score, the % of fibrosis and the collagen fibers (see below).

      Author response image 2.

      HEI3090 activity requires P2RX7’s expressing immune cells: Experimental design. p2rx7-/- mice were given 3.106 il1β-/- splenocytes i.v. one day prior to BLM delivery (i.n. 2.5 U/kg). Mice were treated daily i.p. with 1.5 mg/kg HEI3090 or vehicle for 14 days. (C) Representative images of lung sections at day 14 after treatment stained with H&E and Sirius Red with il1β-/- splenocytes, bar= 100 µm (left) and fibrosis score assessed by the Ashcroft method, the % of fibrosis and the content of collagen fibers (right). Each point represents one mouse (n=2 in WT and NLRP3 experiment, n =1 in IL18 and IL1B experiment), data represented as violin plot or mean±SEM, two-tailed Mann-Whitney test, *p < 0.05. WT: Wildtype, KO: P2RX7 knock-out

      Importantly, in the same experimental setting, e.g adoptive transfer of splenocytes from different genetic backgrounds, HEI3090 decreases the fibrosis intensity only with WT and IL1B KO splenocytes and not with NLRP3 KO and IL18KO splenocytes.

      Author response image 3.

      HEI3090 activity requires P2RX7’s expressing immune cells: Experimental design. p2rx7-/- mice were given 3.106 WT, NLRP3-/-, IL18-/- or IL1β-/- splenocytes i.v. one day prior to BLM delivery (i.n. 2.5 U/kg). Mice were treated daily i.p. with 1.5 mg/kg HEI3090 or vehicle for 14 days. Fibrosis in whole lung was assessed by the % of fibrosis (upper panel) and the content of collagen fibers (lower panel). Each point represents one mouse (n=2 in WT and NLRP3 experiments, n =1 in IL18 and IL1B experiment). Data represented as violin plot or mean±SEM, two-tailed Mann-Whitney test, *p < 0.05. WT: Wildtype, KO: P2RX7 knock-out

      In order to provide clear evidence that HEI3090 functions through P2RX7, a different lung fibrosis model that does not require P2RX7 would be necessary. For example, in such a system the authors could demonstrate a lack of HEI3090-mediated therapeutic effect on P2RX7 knockout.

      Since BLM induces lung fibrosis in P2RX7 KO mice as we showed in this manuscript and as already published by Riteau in 2010, shown earlier in our response (first figure) and because HEI3090 is able to decrease the intensity of fibrosis in WT and IL1B-/- → P2RX7 KO mice but not in KO, NLRP3-/- → P2RX7 KO and IL18-/- → P2RX7 KO mice we believe that our data sustain the conclusion that

      1. HEI3090 required the expression of P2RX7 in immune cells to mediate the antifibrotic activity,

      2. IL1B is not a crucial effector mediating the antifibrotic effect of HEI3090.

      Molecularly, additional evidence on specificity, such as thermal proteome profiling and direct biophysical binding experiments, would also enhance the authors' argument that the compound indeed binds P2RX7 directly and specifically. Since all small molecules have some degree of promiscuity, the absence of an additional P2RX7 modulator, or direct recombinant IL-18 administration (as suggested by another reviewer), is needed to orthogonally validate the functional importance of this pathway. Another way the authors could probe pathway specificity would involve co-administering α-IL-18 with HEI3090 in several key experiments (similar to Figure 4L).

      At the moment we have no funds to do these experiments and given the high competition, we have decided to publish our story without these new data.

      Reviewer #2 (Public Review):

      In the study by Hreich et al, the potency of P2RX7-specific positive modulator HEI3090, developed by the authors, for the treatment of Idiopathic pulmonary fibrosis (IPF) was investigated. Recently, the authors have shown that HEI3090 can protect against lung cancer by stimulating dendritic cell P2RX7, resulting in IL-18 production that stimulates IFN-γ production by T and NK cells (DOI: 10.1038/s41467-021-20912-2). Interestingly, HEI3090 increases IL-18 levels only in the presence of high eATP. Since the treatment options for IPF are limited, new therapeutic strategies and targets are needed. The authors first show that P2RX7/IL-18/IFNG axis is downregulated in patients with IPF. Next, they used a bleomycin-induced lung fibrosis mouse model to show that the use of a positive modulator of P2RX7 leads to the activation of the P2RX7/IL-18 axis in immune cells that limits lung fibrosis onset or progression. Mechanistically, treatment with HEI3090 enhanced IL-18-dependent IFN-γ production by lung T cells leading to a decreased production of IL-17 and TGFβ, major drivers of IPF. The major novelty is the use of the small molecule HEI3090 to stimulate the immune system to limit lung fibrosis progression by targeting the P2RX7, which could be potentially combined with current therapies available. Overall, the study was well performed, and the manuscript is clear.

      We thank the reviewer for this very positive comments.

      However, there is need for more details on the description and interpretation of the adoptive transfer experiments, as well as the statistical analyses and number of replicate independent experiments.

      I’m concerned by the reviewer’s comments, and I would like to bring additional information/explanation, which I hope will convince you on the validity of our approaches.

      Author response image 4.

      Adoptive transfer experiment. Adoptive transfer experiments are classically used to document which immune cells participate in immune cell responses (with more than 150 publications in pubmed with the key words adoptive transfer and onco immunology) and intravenous administration is a common route to trigger lungs (PMID: 23336716). To characterize the molecular effector (P2RX7, NLRP3, IL18 and IL1B) accounting for the antifibrotic effect of HEI3090 we purified splenocytes from donor mice and administrated them intra venously in P2RX7 KO mice. As shown in Author response image 4, HEI3090 has no antifibrotic activity when splenocyte isolated from mice invalidated for p2rx7 are iv into P2RX7 KO mice (KO in KO). By contrast, HEI3090 has antifibrotic activity when WT splenocytes expressing P2RX7 (isolated from WT mice) are transferred into P2RX7 KO mice (WT in KO).

      This experiment brings strong evidence to demonstrate the efficacy of adoptive transfer approach to identify molecular effector required to mediate the antifibrotic effect of HEI3090.

      Statistical analyses and number of replicate independent experiments

      We thank the reviewer for his comment, and we apologize to not have been sufficiently clear in our previous response with this miss phrased statement “the experiment was stopped when significantly statistical results were observed” when we should have written “the experiment was stopped when each experimental group contained at least 5 mice”.

      To define the size of experimental groups we did a pilot experiment, with 4 WT mice (e.g. 4 biological replicates) in each group (as shown aside), and a statistical forecasting based on the result of the pilot experiment (40% difference, standard error: 0.9, α risk: 0.05, power: 0.8). Since we focused on the effect of HEI3090 we based our statistical analysis on a one-way ANOVA analysis comparing in each experiment the vehicle and the treated group.

      The pilot experiment and statistical forecasting indicated 4 mice per group to characterize the effect of HEI3090 on BLM-induced lung fibrosis. Each experiment was started with 6 to 8 mice per group. Being aware that 30% of mice can unexpectedly dye due to BLM treatment, we duplicated the experiment, when necessary, to include at least 5 mice in each group of each experiment meaning 5 biological replicates, knowing that 4 mice are sufficient to statistically analyze the results. In each experiment we have checked for the presence of outlier, using the ROULT method, and removed the outliers when necessary.

    2. Author Response

      The following is the authors’ response to the previous reviews.

      Point to point response for the editors

      We are deeply grateful for the time you have devoted to reviewing this manuscript, and we sincerely thank you. Your insightful feedback has been instrumental in enhancing the quality of our work.

      In the revised version of the manuscript, we have carefully addressed each of the concerns you raised. Below, you will find a detailed summary of how your feedback has been incorporated to improve the overall content and clarity of the document.

      1. P2RX7 effects: In Figure 2, the vehicle treated P2RX7 knockout (panel M) shows an Ashcroft score of about 1.5 after BLM. Comparing this to the Ashcroft score of 3 after BLM in the wildtype (panel C) suggests that P2RX7 deletion is an effective way to reduce fibrosis by half!.

      The argument that HEI3090 also reduces fibrosis by activating P2RX7 is of course very difficult to convey and it seems contradictory that P2RX7 deletion and P2RX7 activation can be both anti-fibrotic. This is an unusual claim and confuses the reviewers as well as the future readers.

      This has many important health implications because activating an inflammatory pathway via P2RX7 and IL-18 could be risky in terms of a fibrosis treatment as inflammatory activation can also worsen fibrosis. The authors' own P2RX7 KO data (untreated vehicle groups) indeed confirms that P2RX7 can be pro-fibrotic.

      We thank the editors for their comment highlighting the lack of clarity in our message. Indeed, we verified whether the antifibrotic action of HEI3090 depends on the expression of P2RX7 by inducing lung fibrosis in P2RX7 KO mice. In doing so, we initially observed that P2RX7 plays a role in the development of BLM-induced lung fibrosis. This is illustrated by a decrease of 50% in the Ashcroft score, as shown in Figure 2M and Supplemental Figure 2C of the revised manuscript.

      To increase the clarity of your message, we added in the text the following paragraph:

      "We further verified whether the antifibrotic action of HEI3090 depends on the expression of P2RX7 by inducing lung fibrosis in p2rx7 knockout (KO) mice. In doing so, we initially observed that P2RX7 plays a role in the development of BLM-induced lung fibrosis. This is illustrated by a decrease of 50% in the Ashcroft score, with a mean value of 1.7 in P2RX7 knockout mice compared to 3 in wild-type mice (Figure 2M and Supplemental Figure 2C). It is important to note that p2rx7 -/- mice still exhibit signs of lung fibrosis, such as thickening of the alveolar wall and a reduction in free air space, in comparison to naïve mice that received PBS instead of BLM (see Supplemental Figure 2A). This result confirms a previous report indicating that BLM-induced lung fibrosis partially depends on the activation of the P2RX7/pannexin-1 axis, leading to the production of IL-1β in the lung. Additionally, in contrast to the observations in WT mice, HEI3090 failed to attenuate the remaining lung fibrosis in p2rx7 -/- mice, as measured by the Ashcroft score (Figure 2M), the percentage of lung tissue with fibrotic lesions, or the intensity of collagen fibers (Supplemental Figure 2D). These results show that P2RX7 alone participates in fibrosis and that HEI3090 exerts a specific antifibrotic effect through this receptor (see Supplemental Figure 2C)."

      Since we used the HEI3090 compound in this study and to be closer to the results, we have replaced the title of 2 chapters in the results section as followed:

      “HEI3090 inhibits the onset of pulmonary fibrosis in the bleomycin mouse model” instead of P2RX7 activation inhibits the onset of pulmonary fibrosis in the bleomycin mouse model and “HEI3090 shapes immune cell infiltration in the lungs" instead of P2RX7 activation shapes immune cell infiltration in the lungs

      We concur that the observation of both anti-fibrotic effects following P2RX7 deletion and P2RX7 activation appears contradictory. This specific aspect has been thoroughly addressed and extensively discussed in the revised manuscript.

      “A major unmet need in the field of IPF is new treatment to fight this uncurable disease. In this preclinical study, we demonstrate the ability of immune cells to limit lung fibrosis progression. Based on the hypothesis that a local activation of a T cell immune response and upregulation of IFN-γ production has antifibrotic proprieties, we used the HEI3090 positive modulator of the purinergic receptor P2RX7, previously developed in our laboratory (Douguet et al., 2021), to demonstrate that activation of the P2RX7/IL-18 pathway attenuates lung fibrosis in the bleomycin mouse model. We have demonstrated that lung fibrosis progression is inhibited by HEI3090 in the fibrotic phase but also in the acute phase of the BLM fibrosis mouse model, i.e. during the period of inflammation. This lung fibrosis mouse model commonly employed in preclinical investigations, has recently been recognized as the optimal model for studying IPF (Jenkins et al., 2017). In this model, the intrapulmonary administration of BLM induces DNA damage in alveolar epithelial type 1 cells, triggering cellular demise and the release of ATP. The extracellular release of ATP from injured cells activates the P2RX7/pannexin 1 axis, initiating the maturation of IL1β and subsequent induction of inflammation and fibrosis. In line with this, mice lacking P2RX7 exhibited reduced neutrophil counts in their bronchoalveolar fluids and decreased levels of IL1β in their lungs compared to WT mice (Riteau et al., 2010). Based on these findings, Riteau and colleagues postulated that the inhibition of P2RX7 activity may offer a potential strategy for the therapeutic control of fibrosis in lung injury. In the present study we provided strong evidence showing that selective activation of P2RX7 on immune cells, through the use of HEI3090, can dampen inflammation and fibrosis by releasing IL-18. The efficacy of HEI3090 to inhibit lung fibrosis was evaluated histologically on the whole lung’s surface by evaluating the severity of fibrosis using three independent approaches applied to the whole lung, the Ashcroft score, quantification of fibroblasts/myofibroblasts (CD140a) and polarized-light microscopy of Sirius Red staining to quantify collagen fibers. All these methods of fibrosis assessment revealed that HEI3090 exerts an inhibitory effect on lung fibrosis, underscoring the necessity for a thorough pre-clinical assessment of HEI3090's mode of action. Notably, HEI3090 functions as an activator, rather than an inhibitor, of P2RX7, further emphasizing the importance of elucidating its intricate mechanisms.”

      We trust that the detailed explanation provided therein will adequately persuade both the reviewers and future readers.

      1. The statistical concerns are based on the phrasing of "the experiment was stopped when significantly statistical results were observed". This is different from the power analysis approach that the authors describe in their latest rebuttal. However, it raises the question why the power analysis was performed using "on a one-way ANOVA analysis comparing in each experiment the vehicle and the treated group". The analyses in the manuscript use the Mann-Whitney test for several comparisons which ahs the assumption that the samples do NOT have a normal distribution. An ANOVA and t-tests have the assumption that samples are normally distributed. If the power analysis and "statistical forecasting" assumed a normal distribution and used an ANOVA, then shouldn't all the analyses also use a statistical test appropriate for normally distributed samples such as ANOVA and t-tests?

      Several of the data points in the figures seem to be normally distributed and therefore t-test for two group comparisons would be more appropriate. The most rigorous approach would be to check for normal distribution before choosing the correct statistical test and using the t-test/ANOVA in normally distributed data as well as Mann-Whitney for non-normally distributed data.

      We described in the Material and Method section of the revised manuscript our approach to determine the size of experimental group.

      “The determination of experimental group sizes involved conducting a pilot experiment with four mice in each group. Subsequently, a power analysis, based on the pilot experiment's findings (which revealed a 40% difference with a standard error of 0.9, α risk of 0.05, and power of 0.8), was performed to ascertain the appropriate group size for studying the effects of HEI3090 on BLM-induced lung fibrosis. The results of the pilot experiment and power analysis indicated that a group size of four mice was sufficient to characterize the observed effects. For each full-scale experiment, we initiated the study with 6 to 8 mice per group, ensuring a minimum of 5 mice in each group for robust statistical analysis. Additionally, we systematically employed the ROULT method to identify and subsequently exclude any outliers present in each experiment before conducting statistical analyses”.

      We now described in the Material and Method section how we carried out the statistical analyses.

      “Quantitative data were described and presented graphically as medians and interquartiles or means and standard deviations. The distribution normality was tested with the Shapiro's test and homoscedasticity with a Bartlett's test. For two categories, statistical comparisons were performed using the Student's t-test or the Mann–Whitney's test. For three and more categories, analysis of variance (ANOVA) or non-parametric data with Kruskal–Wallis was performed to test variables expressed as categories versus continuous variables. If this test was significant, we used the Tukey's test to compare these categories and the Bonferroni’s test to adjust the significant threshold. For the Gene Set Enrichment Analyses (GSEA), bilateral Kolmogorov–Smirnov test, and false discovery rate (FDR) were used. All statistical analyses were performed by biostatistician using Prism8 program from GraphPad software. Tests of significance was two-tailed and considered significant with an alpha level of P < 0.05. (graphically: * for P < 0.05, ** for P < 0.01, *** for P < 0.001).”

      We also added in the legend of each figure, the statistical analysis used to determine each p-values.

      1. Adoptive transfer: The concerns of the reviewers include an unclear analysis of the effects of adoptive transfer itself and the approaches used to analyze the data independent of the HEI3090 effect. For example, in Figure 4, the adoptive transfer IL18-/- cells (vehicle group) leads to an Ashcroft score of about 1 and among the lowest of the BLM exposed mice. Does that mean that IL18 is pro-fibrotic and that its absence is beneficial? If yes, it would go against the core premise of the study that IL18 is beneficial. Statistical comparisons of the all the vehicle conditions in the adoptive transfer would help clarify whether adoptive transfer of NLRP3-/-, IL18-/- in wild-type and P2RX7-/- mice reduces or increases fibrosis. Such multiple comparisons are necessary to fully understand the adoptive transfer studies and would also require the appropriate statistical test with corrections for multiple comparisons such as Kruskal-Wallis for data without normal distribution and ANOVA with post hoc correction for normal distribution.

      We added a new paragraph in the revised version of the manuscript to explain the adoptive transfer approach.

      “We wanted to further investigate the mechanism of action of HEI3090 by identifying the cellular compartment and signaling pathway required for its activity. Since the expression of P2RX7 and the P2RX7-dependent release of IL-18 are mostly associated with immune cells (Ferrari et al., 2006), and since HEI3090 shapes the lung immune landscape (Figure 3), we investigated whether immune cells were required for the antifibrotic effect of HEI3090. To do so, we conducted adoptive transfer experiments wherein immune cells from a donor mouse were intravenously injected one day before BLM administration into an acceptor mouse. The intravenous injection route was chosen as it is a standard method for targeting the lungs, as previously documented (Wei and Zhao, 2014). This approach was previously used with success in our laboratory (Douguet et al., 2021). It is noteworthy that this adoptive transfer approach did not influence the response to HEI3090. This was observed consistently in both p2rx7 -/- mice and p2rx7 -/- mice that received splenocytes of the same genetic background. In both cases, HEI3090 failed to mitigate lung fibrosis, as depicted in Figure 2M and Supplemental Figures 2D and 6A and B.”

      We added the Supplemental Figure 7 showing that the genetic background does not impact lung fibrosis at steady step levels where p-values were analyzed by one-way ANOVA, with Kruskal-Wallis test for multiple comparisons.

      Author response image 1.

      Supplemental Figure 7 : The genetic background does not impact lung fibrosis at steady step levels. p2rx7-/- mice were given 3.106 WT, nlrp3-/ , i118-/ or illb -l- splenocytes i_v_ one day prior to BLM delivery (i_n_ 2.5 LJ/kg) p2rx7-/- mice or p2rx7-/- mice adoptively transferred with splenocytes from indicated genetic background were treated daily i.p with mg/kg HE13090 or vehicle for 14 days. Fibrosis score assessed by the Ashcroft method. P-values were analyzed on all treated and non treated groups by one-way ANOVA, with Kruskal-Wallis test for multiple comparisons. The violin plot illustrates the distribution of Ashcroft scores across indicated experimental groups. The width of the violin at each point represents the density of data, and the central line indicates the median expression level. Each point represents one biological replicate. ns, not significant

    1. Author Response

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      This paper tests the idea that schooling can provide an energetic advantage over solitary swimming. The present study measures oxygen consumption over a wide range of speeds, to determine the differences in aerobic and anaerobic cost of swimming, providing a potentially valuable addition to the literature related to the advantages of group living.

      Response: Thank you for the positive comments.

      Strengths:

      The strength of this paper is related to providing direct measurements of the energetics (oxygen consumption) of fish while swimming in a group vs solitary. The energetic advantages of schooling has been claimed to be one of the major advantages of schooling and therefore a direct energetic assessment is a useful result.

      Response: Thank you for the positive comments.

      Weaknesses:

      1) Regarding the fish to water volume ratio, the arguments raised by the authors are valid. However, the ratio used is still quite high (as high as >2000 in solitary fish), much higher than that recommended by Svendsen et al (2006). Hence this point needs to be discussed in the ms (summarising the points raised in the authors' response)

      Response: Thank you for the comments. We have addressed this point in the previous comments. In short, our ratio is within the range of the published literature. We conducted the additional signal-to-noise analysis for quality assurance.

      2) Wall effects: Fish in a school may have been swimming closer to the wall. The fact that the convex hull volume of the fish school did not change as speed increased is not a demonstration that fish were not closer to the wall, nor is it a demonstration that wall effect were not present. Therefore the issue of potential wall effects is a weakness of this paper.

      Response: Thank you for the comments. We have addressed this point in the previous comments. We provided many other considerations in addition to the convex hull volume. In particular, our boundary layer is < 2.5mm, which was narrower than the width of the giant danio of ~10 mm.

      3) The authors stated "Because we took high-speed videos simultaneously with the respirometry measurements, we can state unequivocally that individual fish within the school did not swim closer to the walls than solitary fish over the testing period". This is however not quantified.

      Response: Thank you for the comments. We have addressed this point in the previous comments. We want to note that the statement in the response letter is to elaborate the discussion points, but not stated as data in the manuscript. The bottom line is very few studies used PIV to quantify the thickness of the boundary layer like what we did in our experiment.

      4) Statistical analysis. The authors have dealt satisfactorily with most of the comments.

      However :

      (a) the following comment has not been dealt with directly in the ms "One can see from the graphs that schooling MO2 tends to have a smaller SD than solitary data. This may well be due to the fact that schooling data are based on 5 points (five schools) and each point is the result of the MO2 of five fish, thereby reducing the variability compared to solitary fish."

      (b) Different sizes were used for solitary and schooling fishes. The authors justify using larger fish as solitary to provide a better ratio of respirometer volume to fish volume in the tests on individual fish. However, mass scaling for tail beat frequency was not provided. Although (1) this is because of lack of data for this species and (2) using scaling exponent of distant species would introduce errors of unknown magnitude, this is still a weakness of the paper that needs to be acknowledged here and in the ms.

      Response: Thank you for the comments. We have addressed both points in the previous comments and provided comprehensive discussions. We also stated the caveats in the method section of the manuscript.

      Reviewer #3 (Public Review):

      Zhang and Lauder characterized both aerobic and anaerobic metabolic energy contributions in schools and solitary fishes in the Giant danio (Devario aequipinnatus) over a wide range of water velocities. By using a highly sophisticated respirometer system, the authors measure the aerobic metabolisms by oxygen uptake rate and the non-aerobic oxygen cost as excess post-exercise oxygen consumption (EPOC). With these data, the authors model the bioenergetic cost of schools and solitary fishes. The authors found that fish schools have a J-shaped metabolism-speed curve, with reduced total energy expenditure per tail beat compared to solitary fish. Fish in schools also recovered from exercise faster than solitary fish. Finally, the authors conclude that these energetic savings may underlie the prevalence of coordinated group locomotion in fish.

      The conclusions of this paper are mostly well supported by data.

      Response: Thank you for the positive comments.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      I have read carefully the revised version of the manuscript and would like to thank the authors for addressing all my comments/suggestions.

      I have no additional comments/suggestions. Now, I strongly believe that this manuscript deserves to be published in eLife.

      Response: Thank you for the positive comments.


      The following is the authors’ response to the original reviews.

      General responses

      Many thanks to the reviewers and editors for their very helpful comments on our manuscript. Below we respond (in blue text) to each of the reviewer comments, both the public ones and the more detailed individual comments in the second part of each review. In some cases, we consider these together where the same point is made in both sets of comments. We have made several changes to the manuscript in response to reviewer suggestions, and we respond in detail to the comments of reviewer #2 who feels that we have overstated the significance of our manuscript and suggests several relevant literature references. We prepared a table summarizing these references and why they differ substantially from the approach taken in our paper here.

      Overall, we would like to emphasize to both reviewers and readers of this response document that previous studies of fish schooling dynamics (or collective movement of vertebrates in general, see Commentary Zhang & Lauder 2023 J. Exp. Biol., doi:10.1242/jeb.245617) have not considered a wide speed range and thus the importance of measuring EPOC (excess post-exercise oxygen consumption) as a key component of energy use. Quantifying both aerobic and non-aerobic energy use allows us to calculate the total energy expenditure (TEE) which we show differs substantially and, importantly, non-linearly with speed between schools and measurements on solitary individuals. Comparison between school total energy use and individual total energy use are critical to understanding the dynamics of schooling behaviour in fishes.

      The scope of this study is the energetics of fish schools. By quantifying the TEE over a wide range of swimming speeds, we also show that the energetic performance curve is concave upward, and not linear, and how schooling behaviour modifies this non-linear relationship.

      In addition, one key implication of our results is that kinematic measurements of fish in schools (such as tail beat frequency) are not a reliable metric by which to estimate energy use. Since we recorded high-speed video simultaneously with energetic measurements, we are able to show that substantial energy savings occur by fish in schools with little to no change in tail beat frequency, and we discuss in the manuscript the various fluid dynamic mechanisms that allow this. Indeed, studies of bird flight show that when flying in a (presumed) energy-saving V-formation, wing beat frequency can actually increase compared to flying alone. We believe that this is a particularly important part of our findings: understanding energy use by fish schools must involve actual measurements of energy use and not indirect and sometimes unreliable kinematic measurements such as tail beat frequency or amplitude.

      Reviewer #1 (Public Review):

      Summary:

      In the presented manuscript the authors aim at quantifying the costs of locomotion in schooling versus solitary fish across a considerable range of speeds. Specifically, they quantify the possible reduction in the cost of locomotion in fish due to schooling behavior. The main novelty appears to be the direct measurement of absolute swimming costs and total energy expenditure, including the anaerobic costs at higher swimming speeds.

      In addition to metabolic parameters, the authors also recorded some basic kinematic parameters such as average distances or school elongation. They find both for solitary and schooling fish, similar optimal swimming speeds of around 1BL/s, and a significant reduction in costs of locomotion due to schooling at high speeds, in particular at ~5-8 BL/s.

      Given the lack of experimental data and the direct measurements across a wide range of speeds comparing solitary and schooling fish, this appears indeed like a potentially important contribution of interest to a broader audience beyond the specific field of fish physiology, in particular for researchers working broadly on collective (fish) behavior.

      Response: Thank you for seeing the potential implications of this study. We also believe that this paper has broader implications for collective behaviour in general, and outline some of our thinking on this topic in a recent Commentary article in the Journal of Experimental Biology: (Zhang & Lauder 2023 doi:10.1242/jeb.245617). Understanding the energetics of collective behaviours in the water, land, and air is a topic that has not received much attention despite the widespread view that moving as a collective saves energy.

      Strengths:

      The manuscript is for the most part well written, and the figures are of good quality. The experimental method and protocols are very thorough and of high quality. The results are quite compelling and interesting. What is particularly interesting, in light of previous literature on the topic, is that the authors conclude that based on their results, specific fixed relative positions or kinematic features (tail beat phase locking) do not seem to be required for energetic savings. They also provide a review of potential different mechanisms that could play a role in the energetic savings.

      Response: Thank you for seeing the nuances we bring to the existing literature and comment on the quality of the experimental method and protocols. Despite a relatively large literature on fish schooling based on previous biomechanical research, our studies suggest that direct measurement of energetic cost clearly demonstrates the energy savings that result from the sum of different fluid dynamic mechanisms depending on where fish are, and also emphasizes that simple metrics like fish tail beat frequency do not adequately reflect energy savings during collective motion.

      Weaknesses:

      A weakness is the actual lack of critical discussion of the different mechanisms as well as the discussion on the conjecture that relative positions and kinematic features do not matter. I found the overall discussion on this rather unsatisfactory, lacking some critical reflections as well as different relevant statements or explanations being scattered across the discussion section. Here I would suggest a revision of the discussion section.

      Response: The critical discussion of the different possible energy-saving mechanisms is indeed an important topic. We provided a discussion about the overall mechanism of ‘local interactions’ in the first paragraph of “Schooling Dynamics and energy conservation”. To clarify, our aim with Figure 1 is to introduce the current mechanisms proposed in the existing engineering/hydrodynamic literature that have studied a number of possible configurations both experimentally and computationally. Thank you for the suggestion of better organizing the discussion to critically highlight different mechanisms that would enable a dynamic schooling structure to still save energy and why the appendage movement frequency does not necessarily couple with the metabolic energy expenditure. Much of this literature uses computational fluid dynamic models or experiments on flapping foils as representative of fish. This exact issue is of great interest to us, and we are currently engaged in a number of other experiments that we hope will shed light on how fish moving in specific formations do or don’t save energy.

      Our aim in presenting Figure 1 at the start of the paper was to show that there are several ways that fish could save energy when moving in a group as shown by engineering analyses, but before investigating these various mechanisms in detail we first have to show that fish moving in groups actually do save energy with direct metabolic measurements. Hence, our paper treats the various mechanisms as inspiration to determine experimentally if, in fact, fish in schools save energy, and if so how much over a wide speed range. Our focus is to experimentally determine the performance curve that shows energy use as speed increases, for schools compared to individuals. Therefore, we have elected not to go into detail about these different hydrodynamic mechanisms in this paper, but rather to present them as a summary of current engineering literature views and then proceed to document energy savings (as stated in the second last paragraph of Introduction). We have an Commentary paper in the Journal of Experimental Biology that addresses this issue generally, and we are reluctant to duplicate much of that discussion here (Zhang & Lauder 2023 doi:10.1242/jeb.245617). We are working hard on this general issue as we agree that it is very interesting. We have revised the Introduction (second last paragraph of Introduction) and Discussion (first paragraph of Discussion) to better indicate our approach, but we have not added any significant discussion of the different hydrodynamic energy saving proposals as we believe that it outside the scope of this first paper and more suitable as part of follow-up studies.

      Also, there is a statement that Danio regularly move within the school and do not maintain inter-individual positions. However, there is no quantitative data shown supporting this statement, quantifying the time scales of neighbor switches. This should be addressed as core conclusions appear to rest on this statement and the authors have 3d tracks of the fish.

      Response: Thank you for pointing out this very important future research direction. Based on our observations and the hypothesized mechanisms for fish within the school to save energy (Fig. 1), we have been conducting follow-up experiments to decipher the multiple dynamic mechanisms that enable the fish within the school to save energy. Tracking the 3D position of each individual fish body in 3D within the fish school has proven difficult. We currently have 3D data on the nose position obtained simultaneously with the energetic measurements, but we do not have full 3D fish body positional data. Working with our collaborators, we are developing a 3-D tracking algorithm that will allow us to quantify how long fish spend in specific formations, and we currently have a new capability to record high-speed video of fish schooling moving in a flow tank for many hours (see our recent perspective by Ko et al., 2023 doi.org/10.1098/rsif.2023.0357). The new algorithms and the results will be published as separate studies and we think that these ongoing experiments are outside the scope of the current study with its focus on energetics. Nevertheless, the main point of Fig. 1 is to provide possible mechanisms to inspire future studies to dissect the detailed hydrodynamic mechanisms for energy saving, and the points raised by this comment are indeed extremely interesting to us and our ongoing experiments in this area. We provide a statement to clarify this point in the 1st paragraph of “Schooling dynamics and energy conservation” section.

      Further, there is a fundamental question on the comparison of schooling in a flow (like a stream or here flow channel) versus schooling in still water. While it is clear that from a pure physics point of view that the situation for individual fish is equivalent. As it is about maintaining a certain relative velocity to the fluid, I do think that it makes a huge qualitative difference from a biological point of view in the context of collective swimming. In a flow, individual fish have to align with the external flow to ensure that they remain stationary and do not fall back, which then leads to highly polarized schools. However, this high polarization is induced also for completely non-interacting fish. At high speeds, also the capability of individuals to control their relative position in the school is likely very restricted, simply by being forced to put most of their afford into maintaining a stationary position in the flow. This appears to me fundamentally different from schooling in still water, where the alignment (high polarization) has to come purely from social interactions. Here, relative positioning with respect to others is much more controlled by the movement decisions of individuals. Thus, I see clearly how this work is relevant for natural behavior in flows and that it provides some insights on the fundamental physiology, but I at least have some doubts about how far it extends actually to “voluntary” highly ordered schooling under still water conditions. Here, I would wish at least some more critical reflection and or explanation.

      Response: We agree completely with this comment that animal group orientations in still fluid can have different causes from their locomotion in a moving fluid. We very much agree with the reviewer that social interactions in still water, which typically involve low-speed locomotion and other behaviours such as searching for food by the group, can be important and could dictate fish movement patterns. In undertaking this project, we wanted to challenge fish to move at speed, and reasoned that if energy savings are important in schooling behaviour due to hydrodynamic mechanisms, we should see this when fish are moving forward against drag forces induced by fluid impacting the school. Drag forces scale as velocity squared, so we should see energy savings by the school, if any, as speed increases.

      We also quantified fish school swimming speeds in the field from the literature and presented a figure showing that in nature fish schools can and do move at considerable speeds. This figure is part of our overview on collective behaviour recently in J. Exp. Biol. (Zhang & Lauder 2023 doi:10.1242/jeb.245617). It is only by studying fish schools moving over a speed range that we can understand the performance curve relating energy use to swimming speed. Indeed, we wonder if fish moving in still water as a collective versus as solitary individuals would show energy savings at all. We now provided the justification for studying fish schooling in moving fluids in the second and third paragraph of the Introduction. When animals are challenged hydrodynamically (e.g. at higher speed), it introduces the need to save energy. Movement in still water lacks the need for fish to save energy. When fish do not need to save locomotor energy in still water, it is hard to justify why we would expect to observe energy saving and related physiological mechanisms in the first place. As the reviewer said, the ‘high polarization in still water has to come purely from social interactions’. Our study does not dispute this consideration, and indeed we agree with it! In our supplementary materials, we acknowledged the definitions for different scenarios of fish schooling can have different behavioural and ecological drivers. Using these definitions, we explicitly stated, in the introduction, that our study focuses on active and directional schooling behaviour to understand the possible hydrodynamic benefits of energy expenditure for collective movements of fish schools. By stating the scope of our study at the outset, we hope that this will keep the discussion focused on the energetics and kinematics of fish schools, without unnecessarily addressing other many possible reasons for fish schooling behaviours in the discussion such as anti-predator grouping, food searching, or reproduction as three examples.

      As this being said, we acknowledge (in the 2nd paragraph of the introduction) that fish schooling behaviour can have other drivers when the flow is not challenging. Also, there are robotic-&-animal interaction studies and computational fluid dynamic simulation studies (that we cited) that show individuals in fish schools interact hydrodynamically. Hydrodynamic interactions are not the same as behaviour interactions, but it does not mean individuals within the fish schooling in moving flow are not interacting and coordinating.

      Related to this, the reported increase in the elongation of the school at a higher speed could have also different explanations. The authors speculate briefly it could be related to the optimal structure of the school, but it could be simply inter-individual performance differences, with slower individuals simply falling back with respect to faster ones. Did the authors test for certain fish being predominantly at the front or back? Did they test for individual swimming performance before testing them in groups together? Again this should be at least critically reflected somewhere.

      Response: Thank you for raising this point. If the more streamlined schooling structure above 2 BL/s is due to the weaker individuals not catching up with the rest of the school, we would expect the weaker individuals to quit swimming tests well before 8 BL/s. However, we did not observe this phenomenon. Although we did not specifically test for the two questions the reviewer raises here, our results suggest that inter-individual variation in the swimming performance of giant Danio is not at the range of 2 to 8 BL/s (a 400% difference). While inter-individual differences certainly exist, we believe that they are small relative to the speeds tested as we did not see any particular individuals consistently unable to keep up with the school or certain individuals maintaining a position near the back of the school. As this being said, we provide additional interpretations for the elongated schooling structure at the end of the 2nd paragraph of the “schooling dynamics and energy conservation” section.

      Reviewer #1 (Recommendations For The Authors):

      Line 58: The authors write "How the fluid dynamics (...) enable energetic savings (...)". However, the paper focuses rather on the question of whether energetic savings exist and does not enlighten us on the dominant mechanisms. Although it gives a brief overview of all possible mechanisms, it remains speculative on the actual fluid dynamical and biomechanical processes. Thus, I suggest changing "How" to "Whether".

      Response: Great point! We changed “How” to “Whether”.

      Lines 129-140: In the discussion of the U-shaped aerobic rate, there is no direct comparison of the minimum cost values between the schooling and solitary conditions. Only the minimum costs during schooling are named/discussed. In addition to the data in the figure, I suggest explicitly comparing them as well for full transparency.

      Response: Thanks for raising this point. We did not belabor this point because there was no statistical significance. As requested, we added a statement to address this with statistics in the 1st paragraph of the Results section.

      Line 149: The authors note that the schooling fish have a higher turning frequency than solitary fish. Here, a brief discussion of potential explanations would be good, e.g. need for coordination with neighbors -> cost of schooling.

      Response: Thank you for the suggestion. In the original version of the manuscript, we discussed that the higher turning frequency could be related to higher postural costs for active stability adjustment at low speeds. As requested, we now added that high turn frequency can relate to the need for coordination with neighbours in the last paragraph of the “Aerobic metabolic rate–speed curve of fish schools” section. As indicated above, the suspected costs of coordination did not result in higher costs of schooling at the lower speed (< 2 BL s-1, where the turn frequency is higher).

      Line 151: The authors discuss the higher maximum metabolic rate of schooling fish as a higher aerobic performance and lower use of aerobic capacity. This may be confusing for non-experts in animal physiology and energetics of locomotion. I recommend providing somewhere in a paper an additional explanation to clarify it to non-experts. While lines 234-240 and further below potentially address this, I found this not very focused or accessible to non-experts. Here, I suggest the authors consider revisions to make it more comprehensible to a wider, interdisciplinary audience.

      Response: We agree with the reviewer that the difference between maximum oxygen uptake and maximum metabolic rate can be confusing. In fact, among animal physiologists, these two concepts are often muddled. One of the authors is working on an invited commentary from J. Exp. Biol. to clearly define these two concepts. We have made the language in the section “Schooling dynamics enhances aerobic performance and reduces non-aerobic energy use” more accessible to a general audience. In addition, the original version presented the relevant framework in the first and the second paragraphs of the Introduction when discussing aerobic and non-aerobic energy contribution. In brief, when vertebrates exhibit maximum oxygen uptake, they use aerobic and non-aerobic energy contributions that both contribute to their metabolic rate. Therefore, the maximum total metabolic rate is higher than the one estimated from only maximum oxygen uptake. We used the method presented in Fig. 3a to estimate the maximum metabolic rate for metabolic energy use (combining aerobic and non-aerobic energy use). In kinesiology, maximum oxygen uptake is used to evaluate the aerobic performance and energy use of human athletes is estimated by power meters or doubly labelled water.

      Line 211: The authors write that Danio regularly move within the school and do not maintain inter-individual positions. Given that this is an important observation, and the relative position and its changes are crucial to understanding the possible mechanisms for energetic savings in schools, I would expect some more quantitative support for this statement, in particular as the authors have access to 3d tracking data. For example introducing some simple metrics like average time intervals between swaps of nearest neighbors, possibly also resolved in directions (front+back versus right+left), should provide at least some rough quantification of the involved timescales, whether it is seconds, tens of seconds, or minutes.

      Response: As responded in the comment above, 3-D tracking of both body position and body deformation of multiple individuals in a school is not a trivial research challenge and we have ongoing research on this issue. We hope to have results on the 3D positions of fish in schools soon! For this manuscript, we believe that the data in Figure 4E which shows the turning frequency of fish in schools and solitary controls shows the general phenomenon of fish moving around (as fish turn to change positions within the school), but we agree that more could be done to address this point and we are indeed working on it now.

      Lines 212-217: There is a very strong statement that energetic savings by collective motion do not require fixed positional arrangements or specific kinematic features. While possibly one of the most interesting findings of the paper, I found that in its current state, it was not sufficiently/satisfactorily discussed. For example for the different mechanisms summarized, there will be clearly differences in their relevance based on relative distance and position. For example mechanisms 3 and 4 likely have significant contributions only at short distances. Here, the question is how relevant can they be if the average distance is 1 BL? Also, 1BL side by side is very much different from 1BL front to back, given the elongated body shape. For mechanisms 1 and 2, it appears relative positioning is quite important. Here, having maybe at least some information from the literature (if available) on the range of wall or push effects or the required precision in relative positioning for having a significant benefit would be very much desired. Also, do the authors suggest that a) these different effects overlap giving any position in the school a benefit, or b) that there are specific positions giving benefits due to different mechanisms and that fish "on purpose" switch only between these energetic "sweet" spots, I guess this what is towards the end referred to as Lighthill conjecture? Given the small group size I find a) rather unlikely, while b) actually also leads to a coordination problem if every fish is looking for a sweet spot. Overall, a related question is whether the authors observed a systematic change in leading individuals, which likely have no, or very small, hydrodynamic benefits.

      Response: Thank you for the excellent discussion on this point. As we responded above, we have softened the tone of the statement. In the original version, we were clear that the known mechanisms as summarized in Fig. 1 lead us to ‘expect’ that fish do not need to be in a fixed position to save energy.

      In general, current engineering/hydrodynamic studies suggest that any fish positioned within one body length (both upstream and downstream and side by side) will benefit from one or more of the hydrodynamic mechanisms that we expect will reduce energy costs, relative to a solitary individual. Our own studies using robotic systems suggest that a leading fish will experience an added mass “push” from a follower when the follower is located within roughly ½ body length behind the leader. We cited a Computational Fluid Dynamic (CFD) study about the relative distance among individuals for energy saving to be in effect. Please keep in mind that CFD simulation is a simplified model of the actual locomotion of fish and involves many assumptions and currently only resolves the time scale of seconds (see commentary of Zhang & Lauder 2023 doi:10.1242/jeb.245617 in J. Exp. Biol. for the current challenges of CFD simulation). To really understand the dynamic positions of fish within the school, we will need 3-D tracking of fish schools with tools that are currently being developed. Ideally, we would also have simultaneous energetic measurements, but of course, this is enormously challenging and it is not clear at this time how to accomplish this.

      We certainly agree that the relative positions of fish (vertically staggered or in-line swimming) do affect the specific hydrodynamic mechanisms being used. We cited the study that discussed this, but the relative positions of fish remain an active area of research. More studies will be out next few years to provide more insight into the effects of the relative positions of fish in energy saving. The Lighthill conjecture is observed in flapping foils and whether fish schools use the Lighthill conjecture for energy saving is an active area of research but still unclear. We also provided a citation about the implication of the Lighthill conjecture on fish schools. Hence, our original version stated ‘The exact energetic mechanisms….would benefit from more in-depth studies’. We agree with the reviewer that not all fish can benefit Lighthill conjecture (if fish schools use it) at any given time point, hence the fish might need to rotate in using the Lighthill conjecture. This is one more explanation for the dynamic positioning of fish in a school.

      Overall, in response to the question raised, we do not believe that fish are actively searching for “sweet spots” within the school, although this is only speculation on our part. We believe instead that fish, located in a diversity of positions within the school, get the hydrodynamic advantage of being in the group at that configuration.

      We believe that fish, once they group and maintain a grouping where individuals are all within around one body length distance from each other, will necessarily get hydrodynamic benefits. As a collective group, we believe that at any one time, several different hydrodynamic mechanisms are all acting simultaneously and result in reduced energetic costs (Fig. 1).

      Figure 4E: The y-axis is given in the units of 10-sec^-1 which is confusing is it 10 1/s or 1/(10s)? Why not use simply the unit of 1/s which is unambiguous?

      Response: Thank you for the suggestions. We counted the turning frequency over the course of 10 seconds. To reflect more accurately on what we did, we used the suggested unit of 1/(10s) to more correctly correspond to how we made the measurements and the duration of the measurement. We recognize that this is a bit non-standard but would like to keep these units if possible.

      Figure 4F: The unit in the school length is given in [mm], which suggests that the maximal measured school length is 4mm, this can't be true.

      Response: Thank you for pointing this out. The unit should be [cm], which we corrected.

      Reviewer #2 (Public Review):

      Summary:

      This paper tests the idea that schooling can provide an energetic advantage over solitary swimming. The present study measures oxygen consumption over a wide range of speeds, to determine the differences in aerobic and anaerobic cost of swimming, providing a potentially valuable addition to the literature related to the advantages of group living.

      Response: Thank you for acknowledging our contribution is a valuable addition to the literature on collective movement by animals.

      Strengths:

      The strength of this paper is related to providing direct measurements of the energetics (oxygen consumption) of fish while swimming in a group vs solitary. The energetic advantages of schooling have been claimed to be one of the major advantages of schooling and therefore a direct energetic assessment is a useful result.

      Response: Thank you for acknowledging our results are useful and provide direct measurements of energetics to prove a major advantage of schooling relative to solitary motion over a range of speeds.

      Weaknesses:

      The manuscript suffers from a number of weaknesses which are summarised below:

      1) The possibility that fish in a school show lower oxygen consumption may also be due to a calming effect. While the authors show that there is no difference at low speed, one cannot rule out that calming effects play a more important role at higher speed, i.e. in a more stressful situation.

      Response: Thank you for raising this creative point on “calming”. When vertebrates are moving at high speeds, their stress hormones (adrenaline, catecholamines & cortisol) increase. This phenomenon has been widely studied, and therefore, we do not believe that animals are ‘calm’ when moving at high speed and that somehow a “calming effect” explains our non-linear concave-upward energetic curves. “Calming” would have to have a rather strange non-linear effect over speed to explain our data, and act in contrast to known physiological responses involved in intense exercise (whether in fish or humans). It is certainly not true for humans that running at high speeds in a group causes a “calming effect” that explains changes in metabolic energy expenditure. We have added an explanation in the third paragraph in the section “Schooling dynamics enhances aerobic performance and reduces non-aerobic energy use”. Moreover, when animal locomotion has a high frequency of appendage movement (for both solitary individual and group movement), they are also not ‘calm’ from a behavioural point of view. Therefore, we respectfully disagree with the reviewer that the ‘calming effect’ is a major contributor to the energy saving of group movement at high speed. It is difficult to believe that giant danio swimming at 8 BL/s which is near or at their maximal sustainable locomotor limits are somehow “calm”. In addition, we demonstrated by direct energetic measurement that solitary individuals do not have a higher metabolic rate at the lower speed and thus directly show that there is very likely no cost of “uncalm” stress that would elevate the metabolic rate of solitary individuals. Furthermore, the current version of this manuscript compared the condition factor of the fish in the school and solitary individuals and found no difference (see Experimental Animal Section in the Methods). This also suggests that the measurement on the solitary fish is likely not confounded by any stress effects.

      Finally, and as discussed further below, since we have simultaneous high-speed videos of fish swimming as we measure oxygen consumption at all speeds, we are able to directly measure fish behaviour. Since we observed no alteration in tail beat kinematics between schools and individuals (a key result that we elaborate on below), it’s very hard to justify that a “calming” effect explains our results. Fish in schools swimming at speed (not in still water) appear to be just as “calm” as solitary individuals.

      2) The ratio of fish volume to water volume in the respirometer is much higher than that recommended by the methodological paper by Svendsen et al. (J Fish Biol 2016) Response: The ratio of respirometer volume to fish volume is an important issue that we thought about in detail before conducting these experiments. While Svendsen et al., (J. Fish Biol. 2016) recommend a respirometer volume-to-fish volume ratio of 500, we are not aware of any experimental study comparing volumes with oxygen measuring accuracy that gives this number as optimal. In addition, the Svendsen et al. paper does not consider that their recommendation might result in fish swimming near the walls of the flume (as a result of having relatively larger fish volume to flume volume) and hence able to alter their energetic expenditure by being near the wall. In our case, we needed to be able to study both a school (with higher animal volumes) and an individual (relatively lower volume) in the same exact experimental apparatus. Thus, we had to develop a system to accurately record oxygen consumption under both conditions.

      The ratio of our respirometer to individual volume for schools is 693, while the value for individual fish is 2200. Previous studies (Parker 1973, Abrahams & Colgan, 1985, Burgerhout et al., 2013) that used a swimming-tunnel respirometer (i.e., a sealed treadmill) to measure the energy cost of group locomotion used values that range between 1116 and 8894 which are large and could produce low-resolution measurements of oxygen consumption. Thus, we believe that we have an excellent ratio for our experiments on both schools and solitary individuals, while maintaining a large enough value that fish don’t experience wall effects (see more discussion on this below, as we experimentally quantified the flow pattern within our respirometer).

      The goal of the recommendation by Svendsen et al. is to achieve a satisfactory R2 (coefficient of determination) value for oxygen consumption data. However, Chabot et al., 2020 (DOI: 10.1111/jfb.14650) pointed out that only relying on R2 values is not always successful at excluding non-linear slopes. Much worse, only pursuing high R2 values has a risk of removing linear slopes with low R2 only because of a low signal-to-noise ratio and resulting in an overestimation of the low metabolic rate. Although we acknowledge the excellent efforts and recommendations provided by Svendsen et al., 2016, we perhaps should not treat the ratio of respirometer to organism volume of 500 as the gold standard for swim-tunnel respirometry. Svendsen et al., 2020 did not indicate how they reached the recommendation of using the ratio of respirometer to organism volume of 500. Moreover, Svendsen et al., 2020 stated that using an extended measuring period can help to resolve the low signal-to-noise ratio. Hence, the key consideration is to obtain a reliable signal-to-noise ratio which we will discuss below.

      To ensure we obtain reliable data quality, we installed a water mixing loop (Steffensen et al., 1984) and used the currently best available technology of oxygen probe (see method section of Integrated Biomechanics & Bioenergetic Assessment System) to improve the signal-to-noise ratio. The water mixing loop is not commonly used in swim-tunnel respirometer. Hence, if a previously published study used a respirometer-to-organism ratio up to 8894, our updated oxygen measuring system is completely adequate to produce reliable signal-to-noise ratios in our system with a respirometer-to-organism ratio of 2200 (individuals) and 693 (schools). In fact, our original version of the manuscript used a published method (Zhang et al., 2019, J. Exp. Biol. https://doi.org/10.1242/jeb.196568) to analyze the signal-to-noise ratio and provided the quantitative approach to determine the sampling window to reliably capture the signal (Fig. S5).

      3) Because the same swimming tunnel was used for schools and solitary fish, schooling fish may end up swimming closer to the wall (because of less volume per fish) than solitary fish. Distances to the wall of schooling fish are not given, and they could provide an advantage to schooling fish.

      Response: This is an issue that we considered carefully in designing these experiments. After considering the volume of the respirometer and the size of the fish (see the response above), we decided to use the same respirometer to avoid any other confounding factors when using different sizes of respirometers with potentially different internal flow patterns. In particular, different sizes of Brett-type swim-tunnel respirometers differ in the turning radius of water flow, which can produce different flow patterns in the swimming section. Please note that we quantified the flow pattern within the flow tank using particle image velocimetry (PIV) (so we have quantitative velocity profiles across the working section at all tested speeds), and modified the provided baffle system to improve the flow in the working section.

      Because we took high-speed videos simultaneously with the respirometry measurements, we can state unequivocally that individual fish within the school did not swim closer to the walls than solitary fish over the testing period (see below for the quantitative measurements of the boundary layer). Indeed, many previous respirometry studies do not obtain simultaneous video data and hence are unable to document fish locations when energetics is measured.

      In studying schooling energetics, we believe that it is important to control as many factors as possible when making comparisons between school energetics and solitary locomotion. We took great care as indicated in the Methods section to keep all experimental parameters the same (same light conditions, same flow tank, same O2 measuring locations with the internal flow loop, etc.) so that we could detect differences if present. Changing the flow tank respirometer apparatus between individual fish and the schools studied would have introduced an unacceptable alteration of experimental conditions and would be a clear violation of the best experimental practices.

      We have made every effort to be clear and transparent about the choice of experimental apparatus and explained at great length the experimental parameters and setup used, including the considerations about the wall effect in the extended Methods section and supplemental material provided.

      Our manuscript provides the measurement of the boundary layer (<2.5 mm at speeds > 2 BL s-1) in the methods section of the Integrated Biomechanics & Bioenergetic Assessment System. We also state that the boundary layer is much thinner than the body width of the giant danio (~10 mm) so that the fish cannot effectively hide near the wall. Due to our PIV calibration, we are able to quantify flow near the wall.

      In the manuscript, we also provide details about the wall effects and fish schools as follows from the manuscript: ”…the convex hull volume of the fish school did not change as speed increased, suggesting that the fish school was not flattening against the wall of the swim tunnel, a typical feature when fish schools are benefiting from wall effects. In nature, fish in the centre of the school effectively swim against a ‘wall’ of surrounding fish where they can benefit from hydrodynamic interactions with neighbours.”’ The notion that the lateral motion of surrounding slender bodies can be represented by a streamlined wall was also proposed by Newman et al., 1970 J. Fluid Mech. These considerations provide ample justification for the comparison of locomotor energetics by schools and solitary individuals.

      4) The statistical analysis has a number of problems. The values of MO2 of each school are the result of the oxygen consumption of each fish, and therefore the test is comparing 5 individuals (i.e. an individual is the statistical unit) vs 5 schools (a school made out of 8 fish is the statistical unit). Therefore the test is comparing two different statistical units. One can see from the graphs that schooling MO2 tends to have a smaller SD than solitary data. This may well be due to the fact that schooling data are based on 5 points (five schools) and each point is the result of the MO2 of five fish, thereby reducing the variability compared to solitary fish. Other issues are related to data (for example Tail beat frequency) not being independent in schooling fish.

      Response: We cannot agree with the reviewer that fish schools and solitary individuals are different statistical units. Indeed, these are the two treatments in the statistical sense: a school versus the individual. This is why we invested extra effort to replicate all our experiments on multiple schools of different individuals and compare the data to multiple different solitary individuals. This is a standard statistical approach, whether one is comparing a tissue with multiple cells to an individual cell, or multiple locations to one specific location in an ecological study. Our analysis treats the collective movement of the fish school as a functional unit, just like the solitary individual is a functional unit. At the most fundamental level of oxygen uptake measurements, our analysis results from calculating the declining dissolved oxygen as a function of time (i.e. the slope of oxygen removal). Comparisons are made between the slope of oxygen removal by fish schools and the slope of oxygen removal by solitary individuals. This is the correct statistical comparison.

      The larger SD in individuals can be due to multiple biological reasons other than the technical reasons suggested here. Fundamentally, the different SD between fish schools and individuals can be the result of differences between solitary and collective movement and the different fluid dynamic interactions within the school could certainly cause differences in the amount of variation seen. Our interpretation of the ‘numerically’ smaller SD in fish schools than that of solitary individuals suggests that interesting hydrodynamic phenomena within fish schools remain to be discovered.

      Reviewer #2 (Recommendations For The Authors):

      I have reviewed a previous version of this paper. This new draft is somewhat improved but still presents a number of issues which I have outlined below.

      Response: Thanks for your efforts to improve our paper with reviews, but a number of your comments apply to the previous version of the paper, and we have made a number of revisions before submitting it to eLife. We explain below how this version of the manuscript addresses many of your comments from both the previous and current reviews. As readers can see from our responses below, this version of the manuscript version no longer uses only ‘two-way ANOVA’ as we have implemented an additional statistical model. (Please see the comments below for more detailed responses related to the statistical models).

      1) One of the main problems, and one of the reasons (see below) why many previous papers have measured TBF and not the oxygen consumption of a whole school, is that schooling also provides a calming effect (Nadler et al 2018) which is not easily differentiated from the hydrodynamic advantages (Abraham and Colgan 1985). This effect can reduce the MO2 while swimming and the EPOC when recovering. The present study does not fully take this potential issue into account and therefore its results are confounded by such effects. The authors state (line 401) that " the aerobic locomotion cost of solitary individuals showed no statistical difference from (in fact, being numerically lower) that of fish schools at a very low testing speed. The flow speed is similar to some areas of the aerated home aquarium for each individual fish. This suggests that the stress of solitary fish likely does not meaningfully contribute to the higher locomotor costs". While this is useful, the possibility that at higher speeds (i.e. a more stressful situation) solitary fish may experience more stress than fish in a school, cannot be ruled out.

      Response: Thank you for finding our results and data useful. We have addressed the comments on calming or stress effects in our response above. The key point is that either solitary or school fish are challenged (i.e. stressed) at a high speed where the sizable increases in stress hormones are well documented in the exercise physiology literature. We honestly just do not understand how a “calming” effect could possibly explain the upward concave energetic curves that we obtained, and how “calming” could explain the difference between schools and solitary individuals. Since we have simultaneous high-speed videos of fish swimming as we measure oxygen consumption at all speeds, we are able to directly observe fish behaviour. It is not exactly clear what a “calming effect” would look like kinematically or how one would measure this experimentally, but since we observed no alteration in tail beat kinematics between schools and individuals (a key result that we elaborate on below), it’s very hard to justify that a “calming” effect explains our results. Fish in schools appear to be just as “calm” as solitary individuals.

      If the reviewer's “calming effect” is a general issue, then birds flying in a V-formation should also experience a “calming effect”, but at least one study shows that birds in a V-formation experience higher wing beat frequencies.

      In addition, Nalder et al., 2018 (https://doi.org/10.1242/bio.031997) did not study any such “calming effect”. We assume the reviewer is referring to Nalder et al., 2016, which showed that shoaling reduced fish metabolic rates in a resting respirometer that has little-to-no water current that would motivate fish to swim (which is very different from the swim-tunnel respirometer we used). Moreover, the inter-loop system used by Nalder et al., 2016 has the risk of mixing the oxygen uptake of the fish shoal and solitary individuals. Hence, we believe that it is not appropriate to extend the results of Nalder et al., 2016 to infer and insist on a calming effect for fish schools that we studied which are actively and directionally swimming over a wide speed range up to and including high speeds. Especially since our data clearly show that ‘the aerobic locomotion cost of solitary individuals showed no statistical difference from (in fact, being numerically lower) that of fish schools at very low testing speeds’. More broadly, shoaling and schooling are very different in terms of polarization as well as the physiological and behavioural mechanisms used in locomotion. Shoaling behaviour by fish in still water is not the same as active directional schooling over a speed range. Our supplementary Table 1 provides a clear definition for a variety of grouping behaviours and makes the distinction between shoaling and schooling.

      Our detailed discussion about other literature mentioned by this reviewer can be seen in the comments below.

      2) The authors overstate the novelty of their work. Line 29: "Direct energetic measurements demonstrating the 30 energy-saving benefits of fluid-mediated group movements remain elusive" The idea that schooling may provide a reduction in the energetic costs of swimming dates back to the 70s, with pioneering experimental work showing a reduction in tail beat frequency in schooling fish vs solitary (by Zuyev, G. V. & Belyayev, V. V. (1970) and theoretical work by Weihs (1973). Work carried out in the past 20 years (Herskin and Steffensen 1998; Marras et al 2015; Bergerhout et al 2013; Hemelrijk et al 2014; Li et al 2021, Wiwchar et al 2017; Verma et al 2018; Ashraf et al 2019) based on a variety of approaches has supported the idea of a reduction in swimming costs in schooling vs solitary fish. In addition, group respirometry has actually been done in early and more recent studies testing the reduction in oxygen consumption as a result of schooling (Parker, 1973; Itazawa et al., 1978; Abrahams and Colgan 1985; Davis & Olla, 1992; Ross & Backman, 1992, Bergerhout et al 2013; Currier et al 2020). Specifically, Abrahams and Colgan (1985) and Bergerhout et al (2013) found that the oxygen consumption of fish swimming in a school was higher than when solitary, and Abrahams and Colgan (1985) made an attempt to deal with the confounding calming effect by pairing solitary fish up with a neighbor visible behind a barrier. These issues and how they were dealt with in the past (and in the present manuscript) are not addressed by the present manuscript. Currier et al (2020) found that the reduction of oxygen consumption was species-specific.

      Response: We cannot agree with this reviewer that we have overstated the novelty of our work, and, in fact, we make very specific comments on the new contributions of our paper relative to the large previous literature on schooling. We are well aware of the literature cited above and many of these papers have little or nothing to do with quantifying the energetics of schooling. In addition, many of these papers rely on simple kinematic measurements which are unrelated to direct energetic measurements of energy use. To elaborate on this, we present the ‘Table R’ below which evaluates and compares each of the papers this reviewer cites above. The key message (as we wrote in the manuscript) is that none of the previous studies measured non-aerobic cost (and thus do not calculate the total energy expenditure (TEE), which we show to be substantial. In addition, many of these studies do not compare schools to individuals, do not quantify both energetics and kinematics, and do not study a wide speed range. Only 33% of previous studies used direct measurements of aerobic metabolic rate to compare the locomotion costs of fish schools and solitary individuals (an experimental control). We want to highlight that most of the citations in the reviewer’s comments are not about the kinematics or hydrodynamics of fish schooling energetics, although they provide peripheral information on fish schooling in general. We also provide an overview of the literature on this topic in our paper in the Journal of Experimental Biology (Zhang & Lauder 2023 doi:10.1242/jeb.245617) and do not wish to duplicate that discussion here. We summarized and cited the relevant papers about the energetics of fish schooling in Table 1.

      Author response table 1.

      Papers cited by Reviewer #2, and a summary of their contributions and approach.

      References cited above:

      Zuyev, G., & Belyayev, V. V. (1970). An experimental study of the swimming of fish in groups as exemplified by the horsemackerel [Trachurus mediterraneus ponticus Aleev]. J Ichthyol, 10, 545-549.

      Weihs, D. (1973). Hydromechanics of fish schooling. Nature, 241(5387), 290-291.

      Herskin, J., & Steffensen, J. F. (1998). Energy savings in sea bass swimming in a school: measurements of tail beat frequency and oxygen consumption at different swimming speeds. Journal of Fish Biology, 53(2), 366-376.

      Marras, S., Killen, S. S., Lindström, J., McKenzie, D. J., Steffensen, J. F., & Domenici, P. (2015). Fish swimming in schools save energy regardless of their spatial position. Behavioral ecology and sociobiology, 69, 219-226.

      Burgerhout, E., Tudorache, C., Brittijn, S. A., Palstra, A. P., Dirks, R. P., & van den Thillart, G. E. (2013). Schooling reduces energy consumption in swimming male European eels, Anguilla anguilla L. Journal of experimental marine biology and ecology, 448, 66-71.

      Hemelrijk, C. K., Reid, D. A. P., Hildenbrandt, H., & Padding, J. T. (2015). The increased efficiency of fish swimming in a school. Fish and Fisheries, 16(3), 511-521.

      Li, L., Nagy, M., Graving, J. M., Bak-Coleman, J., Xie, G., & Couzin, I. D. (2020). Vortex phase matching as a strategy for schooling in robots and in fish. Nature communications, 11(1), 5408.

      Wiwchar, L. D., Gilbert, M. J., Kasurak, A. V., & Tierney, K. B. (2018). Schooling improves critical swimming performance in zebrafish (Danio rerio). Canadian Journal of Fisheries and Aquatic Sciences, 75(4), 653-661.

      Verma, S., Novati, G., & Koumoutsakos, P. (2018). Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences, 115(23), 5849-5854.

      Ashraf, I., Bradshaw, H., Ha, T. T., Halloy, J., Godoy-Diana, R., & Thiria, B. (2017). Simple phalanx pattern leads to energy saving in cohesive fish schooling. Proceedings of the National Academy of Sciences, 114(36), 9599-9604.

      Parker Jr, F. R. (1973). Reduced metabolic rates in fishes as a result of induced schooling. Transactions of the American Fisheries Society, 102(1), 125-131.

      Itazawa, Y., & Takeda, T. (1978). Gas exchange in the carp gills in normoxic and hypoxic conditions. Respiration physiology, 35(3), 263-269.

      Abrahams, M. V., & Colgan, P. W. (1985). Risk of predation, hydrodynamic efficiency and their influence on school structure. Environmental Biology of Fishes, 13, 195-202.

      Davis, M. W., & Olla, B. L. (1992). The role of visual cues in the facilitation of growth in a schooling fish. Environmental biology of fishes, 34, 421-424.

      Ross, R. M., Backman, T. W., & Limburg, K. E. (1992). Group-size-mediated metabolic rate reduction in American shad. Transactions of the American Fisheries Society, 121(3), 385-390.

      Currier, M., Rouse, J., & Coughlin, D. J. (2021). Group swimming behaviour and energetics in bluegill Lepomis macrochirus and rainbow trout Oncorhynchus mykiss. Journal of Fish Biology, 98(4), 1105-1111.

      Halsey, L. G., Wright, S., Racz, A., Metcalfe, J. D., & Killen, S. S. (2018). How does school size affect tail beat frequency in turbulent water?. Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology, 218, 63-69.

      Johansen, J. L., Vaknin, R., Steffensen, J. F., & Domenici, P. (2010). Kinematics and energetic benefits of schooling in the labriform fish, striped surfperch Embiotoca lateralis. Marine Ecology Progress Series, 420, 221-229.

      3) In addition to the calming effect, measuring group oxygen consumption suffers from a number of problems as discussed in Herskin and Steffensen (1998) such as the fish volume to water volume ratio, which varies considerably when testing a school vs single individuals in the same tunnel and the problem of wall effect when using a small volume of water for accurate O2 measurements. Herskin and Steffensen (1998) circumvented these problems by measuring tailbeat frequencies of fish in a school and then calculating the MO2 of the corresponding tailbeat frequency in solitary fish in a swim tunnel. A similar approach was used by Johansen et al (2010), Marras et al (2015), Halsey et al (2018). However, It is not clear how these potential issues were dealt with here. Here, larger solitary D. aequipinnatus were used to increase the signal-to-noise ratio. However, using individuals of different sizes makes other variables not so directly comparable, including stress, energetics, and kinematics. (see comment 7 below).

      Response: We acknowledge the great efforts made by previous studies to understand the energetics of fish schooling. These studies, as detailed in the table and elaborated in the response above (see comment 2) are very different from our current study. Our study achieved a direct comparison of energetics (including both aerobic and non-aerobic cost) and kinematics between solitary individuals and fish schools that has never been done before. Our detailed response to the supposed “calming effect” is given above.

      As highlighted in the previous comments and opening statement, our current version has addressed the wall effect, tail beat frequency, and experimental and analytical efforts invested to directly compare the energetics between fish schools and solitary individuals. As readers can see in our comprehensive method section, achieving the direct comparison between solitary individuals and fish schools is not a trivial task. Now we want to elaborate on the role of kinematics as an indirect estimate of energetics. Our results here show that kinematic measurements of tail beat frequency are not reliable estimates of energetic cost, and the previous studies cited did not measure EPOC and those costs are substantial, especially as swimming speed increases. Fish in schools can save energy even when the tail beat frequency does not change (although school volume can change as we show). We elaborated (in great detail) on why kinematics does not always reflect on the energetics in the submitted version (see last paragraph of “Schooling dynamics and energy conservation” section). Somehow modeling what energy expenditure should be based only on tail kinematics is, in our view, a highly unreliable approach that has never been validated (e.g., fish use more than just tails for locomotion). Indeed, we believe that this is an inadequate substitute for direct energy measurements. We disagree that using slightly differently sized individuals is an issue since we recorded fish kinematics across all experiments and included the measurements of behaviour in our manuscript. Slightly altering the size of individual fish was done on purpose to provide a better ratio of respirometer volume to fish volume in the tests on individual fish, thus we regard this as a benefit of our approach and not a concern.

      Finally, in another study of the collective behaviour of flying birds (Usherwood, J. R., Stavrou, M., Lowe, J. C., Roskilly, K. and Wilson, A. M. (2011). Flying in a flock comes at a cost in pigeons. Nature 474, 494-497), the authors observed that wing beat frequency can increase during flight with other birds. Hence, again, we cannot regard movement frequency of appendages as an adequate substitute for direct energetic measurements.

      4) Svendsen et al (2016) provide guidelines for the ratio of fish volume to water volume in the respirometer. The ratio used here (2200) is much higher than that recommended. RFR values higher than 500 should be avoided in swim tunnel respirometry, according to Svendsen et al (2016).

      Response: Thank you for raising this point. Please see the detailed responses above to the same comment above. We believe that our experimental setup and ratios are very much in line with those recommended, and represent a significant improvement on previous studies which use large ratios.

      5) Lines 421-436: The same goes for wall effects. Presumably, using the same size swim tunnel, schooling fish were swimming much closer to the walls than solitary fish but this is not specifically quantified here in this paper. Lines 421-436 provide some information on the boundary layer (though wall effects are not just related by the boundary layer) and some qualitative assessment of school volume. However, no measurement of the distance between the fish and the wall is given.

      Response: Please see the detailed responses above to the same comment. Specifically, we used the particle image velocimetry (PIV) system to measure the boundary layer (<2.5 mm at speeds > 2 BL s-1) and stated the parameters in the methods section of the Integrated Biomechanics & Bioenergetic Assessment System. We also state that the boundary layer is much thinner than the body width of the giant danio (~10 mm) so that the fish cannot effectively hide near the wall. Due to our PIV calibration, we are able to quantify flow near the wall.

      Due to our video data obtained simultaneously with energetic measurements, we do not agree that fish were swimming closer to the wall in schools and also note that we took care to modify the typical respirometer to both ensure that flow across the cross-section did not provide any refuges and to quantify flow velocities in the chamber using particle image velocimetry. We do not believe that any previous experiments on schooling behaviour in fish have taken the same precautions.

      6) The statistical tests used have a number of problems. Two-way ANOVA was based on school vs solitary and swimming speed. However, there are repeated measures at each speed and this needs to be dealt with. The degrees of freedom of one-way ANOVA and T-tests are not provided. These tests took into account five groups of fish vs. five solitary fish. The values of MO2 of each school are the result of the oxygen consumption of each fish, and therefore the test is comparing 5 individuals (i.e. an individual is the statistical unit) vs 5 schools (a school made out of 8 fish is the statistical unit). Therefore the test is comparing two different statistical units. One can see from the graphs that schooling MO2 tend to have a smaller SD than solitary data. This may well be due to the fact that schooling data are based on 5 points (five schools) and each point is the result of the MO2 of five fish, thereby reducing the variability compared to solitary fish. TBF, on the other hand, can be assigned to each fish even in a school, and therefore TBF of each fish could be compared by using a nested approach of schooling fish (nested within each school) vs solitary fish, but this is not the statistical procedure used in the present manuscript. The comparison between TBFs presumably is comparing 5 individuals vs all the fish in the schools (6x5=30 fish). However, the fish in the school are not independent measures.

      Response: We cannot agree with this criticism, which may be based on this reviewer having seen a previous version of the manuscript. We did not use two-way ANOVA in this version. This version of the manuscript reported the statistical value based on a General Linear Model (see statistical section of the method). We are concerned that this reviewer did not in fact read either the Methods section or the Results section. In addition, it is hard to accept that, from examination of the data shown in Figure 3, there is not a clear and large difference between schooling and solitary locomotion, regardless of the statistical test used.

      Meanwhile, the comments about the ‘repeated’ measures from one speed to the next are interesting, but we cannot agree. The ‘repeated’ measures are proper when one testing subject is assessed before and after treatment. Going from one speed to the next is not a treatment. Instead, the speed is a dependent and continuous variable. In our experimental design, the treatment is fish school, and the control is a solitary individual. Second, we never compared any of our dependent variables across different speeds within a school or within an individual. Instead, we compared schools and individuals at each speed. In this comparison, there are no ‘repeated’ measures. We agree with the reviewer that fish in the school are interacting (not independent). This is one more reason to support our approach of treating fish schools as a functional and statistical unit in our experiment design (more detailed responses are stated in the response to the comment above).

      7) The size of solitary and schooling individuals appears to be quite different (solitary fish range 74-88 cm, schooling fish range 47-65 cm). While scaling laws can correct for this in the MO2, was this corrected for TBF and for speed in BL/s? Using BL/s for speed does not completely compensate for the differences in size.

      Response: Our current version has provided justifications for not conducting scaling in the values of tail beat frequency. Our justification is “The mass scaling for tail beat frequency was not conducted because of the lack of data for D. aequipinnatus and its related species. Using the scaling exponent of distant species for mass scaling of tail beat frequency will introduce errors of unknown magnitude.”. Our current version also acknowledges the consideration about scaling as follows: “Fish of different size swimming at 1 BL s-1 will necessarily move at different Reynolds numbers, and hence the scaling of body size to swimming speed needs to be considered in future analyses of other species that differ in size”

      Reviewer #3 (Public Review):

      Summary:

      Zhang and Lauder characterized both aerobic and anaerobic metabolic energy contributions in schools and solitary fishes in the Giant danio (Devario aequipinnatus) over a wide range of water velocities. By using a highly sophisticated respirometer system, the authors measure the aerobic metabolisms by oxygen uptake rate and the non-aerobic oxygen cost as excess post-exercise oxygen consumption (EPOC). With these data, the authors model the bioenergetic cost of schools and solitary fishes. The authors found that fish schools have a J-shaped metabolism-speed curve, with reduced total energy expenditure per tail beat compared to solitary fish. Fish in schools also recovered from exercise faster than solitary fish. Finally, the authors conclude that these energetic savings may underlie the prevalence of coordinated group locomotion in fish.

      The conclusions of this paper are mostly well supported by data, but some aspects of methods and data acquisition need to be clarified and extended.

      Response: Thank you for seeing the value of our study. We provided clarification of the data acquisition system with a new panel of pictures included in the supplemental material to show our experimental system. We understand that our methods have more details and justifications than the typical method sections. First, the details are to promote the reproducibility of the experiments. The justifications are the responses to reviewer 2, who reviewed our previous manuscript version and also posted the same critiques after we provided the justifications for the construction of the system and the data acquisition.

      Strengths:

      This work aims to understand whether animals moving through fluids (water in this case) exhibit highly coordinated group movement to reduce the cost of locomotion. By calculating the aerobic and anaerobic metabolic rates of school and solitary fishes, the authors provide direct energetic measurements that demonstrate the energy-saving benefits of coordinated group locomotion in fishes. The results of this paper show that fish schools save anaerobic energy and reduce the recovery time after peak swimming performance, suggesting that fishes can apport more energy to other fitness-related activities whether they move collectively through water.

      Response: Thank you. We are excited to share our discoveries with the world.

      Weaknesses:

      Although the paper does have strengths in principle, the weakness of the paper is the method section. There is too much irrelevant information in the methods that sometimes is hard to follow for a researcher unfamiliar with the research topic. In addition, it was hard to imagine the experimental (respirometer) system used by the authors in the experiments; therefore, it would be beneficial for the article to include a diagram/scheme of that respiratory system.

      Response: We agree with the reviewer and hence added the pictures of the experimental system in the supplementary materials (Fig. S4). We think pictures are more realistic to present the system than schematics. We also provide a picture of the system during the process of making the energetic measurements. It is to show the care went to ensure fish are not affected by any external stimulation other than the water velocity. The careful experimental protocol is very critical to reveal the concave upward shaped curve of bony fish schools that was never reported before. Many details in the methods have been included in response to Reviewer 2.

      Reviewer #3 (Recommendations For The Authors):

      Overall, this is a very interesting, well-written, and nice article. However, many times the method section looks like a discussion. Furthermore, the authors need to check the use of the word "which" throughout the text. I got the feeling that it is overused/misused sometimes.

      Response: Thank you for the positive comments. The method is written in that way to address the concerns of Reviewer 2 who reviewed our previous versions. We corrected the overuse of ‘which’ throughout the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Transcriptional readthrough, intron retention, and transposon expression have been previously shown to be elevated in mammalian aging and senescence by multiple studies. The current manuscript claims that the increased intron retention and readthrough could completely explain the findings of elevated transposon expression seen in these conditions. To that end, they analyze multiple RNA-seq expression datasets of human aging, human senescence, and mouse aging, and establish a series of correlations between the overall expression of these three entities in all datasets.

      While the findings are useful, the strength of the evidence is incomplete, as the individual analyses unfortunately do not support the claims. Specifically, to establish this claim there is a burden of proof on the authors to analyze both intron-by-intron and gene-by-gene, using internal matched regions, and, in addition, thoroughly quantify the extent of transcription of completely intergenic transposons and show that they do not contribute to the increase in aging/senescence. Furthermore, the authors chose to analyze the datasets as unstranded, even though strand information is crucial to their claim, as both introns and readthrough are stranded, and if there is causality, than opposite strand transposons should show no preferential increase in aging/senescence. Finally, there are some unclear figures that do not seem to show what the authors claim. Overall, the study is not convincing.

      Major concerns: 1) Why were all datasets treated as unstanded? Strand information seems critical, and should not be discarded. Specifically, stranded information is crucial to increase the confidence in the causality claimed by the authors, since readthrough and intron retention are both strand specific, and therefore should influence only the same strand transposons and not the opposite-strand ones.

      This is an excellent suggestion. Since only one of our datasets was stranded, we did not run stranded analyses for the sake of consistency. We would like to provide two analyses here that consider strandedness:

      First, we find that within the set of all expressed transposons (passing minimal read filtering), 86% of intronic transposons match the strand of the intron (3147 out of 3613). In contrast, the number is 51% after permutation of the strands. Similarly, when we randomly select 1000 intronic transposons 45% match the strandedness of the intron (here we select from the set of all transposons). This is consistent with the idea that most transposons are only detectable because they are co-expressed on the sense strand of other features that are highly expressed.

      As for the readthrough data, 287 out of 360 transposons (79%) within readthrough regions matched the strand of the gene and its readthrough.

      Second, in the model we postulate, the majority of transposon transcription occurs as a co-transcriptional artifact. This applies equally to genic transposons (gene expression), intronic (intron retention) and gene proximal (readthrough or readin) transposons. Therefore, we performed the following analysis for the set of all transposons in the Fleischer et al. fibroblast dataset.

      When we invert the strand annotation for transposons, before counting and differential expression, we would expect the counts and log fold changes to be lower compared to using the “correct” annotation file.

      Indeed, we show that out of 6623 significantly changed transposons with age only 226 show any expression in the “inverted run” (-96%). (Any expression is defined as passing basic read filtering.)

      Out of the 226 transposons that can be detected in both runs most show lower counts (A) and age-related differential expression converging towards zero (B) in the inverted run (Fig. L1).

      Author response image 1.

      Transposons with inverted strandedness (“reverse”) show lower expression levels (log counts; A) and no differential expression with age (B) when compared to matched differentially expressed transposons (“actual”). For this analysis we selected all transposons showing significant differential expression with age in the actual dataset that also showed at least minimal expression in the strand-inverted analysis (n=226). Data from Fleischer et al. (2018). (A) The log (counts) are clipped because we only used transposons that passed minimal read filtering in this analysis. (B) The distribution of expression values in the actual dataset is bimodal and positive since some transposons are significantly up- or downregulated. This bimodal distribution is lost in the strand-inverted analysis.

      2) "Altogether this data suggests that intron retention contributes to the age-related increase in the expression of transposons" - this analysis doesn't demonstrate the claim. In order to prove this they need to show that transposons that are independent of introns are either negligible, or non-changing with age.

      We would like to emphasize that we never claimed that intron retention and readthrough can explain all of the age-related increases in transposon expression. In fact, our data is compatible with a multifactorial origin of transposons expression. Age- and senescence-related transposon expression can occur due to: 1/ intron retention, 2/ readthrough, 3/ loss of intergenic heterochromatin. Specifically, we do not try to refute 3.

      However, since most transposons are found in introns or downstream of genes, this suggests that intron retention and readthrough will be major, albeit non-exclusive, drivers of age-related changes in transposons expression. Even if the fold-change for intergenic transposons with aging or senescence were higher this would not account for the broadscale expression patterns seen in RNAseq data.

      To further illustrate this, we analyzed transposons located in introns, genes, downstream (ds) or upstream (us) of genes (distance to gene < 25 kb) or in intergenic regions (distance to gene > 25 kb). Indeed, we find that although intergenic transposons show similar log-fold changes to other transposon classes (Fig. L2A), their total contribution to read counts is negligible (Fig. L2B, Fig. Fig. S15). We have also now added a more nuanced explanation of this issue to the discussion.

      Author response image 2.

      We analyzed transposons located in introns, genes, downstream (ds) or upstream (us) of genes (distance to gene < 25 kb) or in intergenic regions (distance to gene > 25 kb). Independent of their location, transposons show similar differential expression with aging or cellular senescence (A). In contrast, the expression of transposons (log counts) is highly dependent on their location and the median log(count) value decreases in the order: genic > intronic > ds > us > intergenic.

      Author response image 3.

      Total counts are the sum of all counts from transposons located in introns, genes, downstream (ds) or upstream (us) of genes (distance to gene < 25 kb) or in intergenic regions (distance to gene > 25 kb). Counts were defined as cumulative counts across all samples.

      3) Additionally, the correct control regions should be intronic regions other than the transposon, which overall contributed to the read counts of the intron.

      4) Furthermore, analysis of read spanning intron and partly transposons should more directly show this contribution.

      Thank you for this comment. To rephrase this, if we understand correctly, the concern is that an increase in transposon expression could bias the analysis of intron retention since transposons often make up a substantial portion of an intron. We would like to address this concern with the following three points:

      First, if the concern is the correlation between log fold-change of transposons vs log fold-change of their containing introns, we do not think that this kind of data is biased. While transposons make up much of the intron, a single transposon on average only accounts for less than 10% of an intron.

      Second, to address this more directly, we show here that even introns that do not contain expressed transposons are increased in aging fibroblasts and after induction of cellular senescence (Fig. S8). This shows that intron retention is universal and most likely not heavily biased by the presence or absence of expressed transposons.

      Author response image 4.

      We split the set of introns that significantly change with cellular aging (A) or cell senescence (B) into introns that contain at least one transposon (has_t) and those that do not contain any transposons (has_no_t). Intron retention is increased in both groups. In this analysis we included all transposons that passed minimal read filtering (n=63782 in A and n=124173 in B). Median log-fold change indicated with a dashed red line for the group of introns without transposons.

      Third, we provide an argument based on the distribution of transposons within introns (Fig. L3).

      Author response image 5.

      The 5’ and 3’ splice sites show the highest sequence conservation between introns, whereas the majority of the intronic sequence does not. This is because these sites contain binding sites for splicing factors such as U1, U2 and SF1 (A). Transposons could affect splicing and we present a biologically plausible mechanism and two ancillary hypotheses here (B). If transposons affect the splicing (retention) of introns the most likely mechanism would be via impairment of splice site recognition because a transposon close to the site forms a secondary structure, binds an effector protein or provides inadequate sequences for pairing. Hypothesis 1: Transposons impair splicing because they are close to the splice site. Hypothesis 2: Transposons do not impair splicing because they are located away from the splice junction. Retained introns should show a similar depletion of transposons around the junction.

      Image adapted from: Ren, Pingping, et al. "Alternative splicing: a new cause and potential therapeutic target in autoimmune disease." Frontiers in Immunology 12 (2021): 713540.

      Consistent with hypothesis 2 (“transposons do not impair splicing”), we show that the distribution of transposons within introns is similar for the set of all transposons and all significant transposons within significantly overexpressed introns (Fig. S7. A and B is similar in the case of aged fibroblasts; D and E is similar in the case of cellular senescence). If transposon expression was causally linked to changes in intron retention, the most likely mechanism would be via an impairment of splicing. We would expect transposons to be located close to the splice junction, which is not what we observed. Instead, the data is more consistent with intron retention as a driver of transposon expression.

      Author response image 6.

      Transposons are evenly distributed within introns except for the region close to splice junctions (A-E). Transposons appear to be excluded from the splice junction-adjacent region both in all introns (A, D) and in significantly retained introns (B, E). In addition, transposon density of all introns and significantly retained introns is comparable (C, F). We included only introns containing at least one transposon in this analysis. A) Distribution of 2292769 transposons within 163498 introns among all annotated transposons. B) Distribution of 195190 transposons within 14100 introns significantly retained with age. C) Density (transposon/1kb of intron) of transposons in all introns (n=163498) compared to significantly retained introns (n=14100). D) as in (A) E) Distribution of 428130 transposons within 13205 introns significantly retained with induced senescence. F) Density (transposon/1kb of intron) of transposons in all introns (n=163498) compared to significantly retained introns (n=13205).

      5) "This contrasts with the almost completely even distribution of randomly permuted transposons." How was random permutation of transposons performed? Why is this contract not trivial, and why is this a good control?

      Permutation was performed using the bedtools shuffle function (Quinlan et al. 2010). We use the set of all annotated transposons and all reshuffled transposons as a control. It is interesting to observe that these two show a very similar distribution with transposons evenly spread out relative to genes. In contrast, expressed transposons are found to cluster downstream of genes. This gave rise to our initial working hypothesis that readthrough should affect transposon expression.

      6) Fig 4: the choice to analyze only the 10kb-20kb region downstream to TSE for readthrough regions has probably reduced the number of regions substantially (there are only 200 left) and to what extent this faithfully represent the overall trend is unclear at this point.

      This is addressed in Suppl. Fig. 7, we repeated the analysis for every 10kb region between 0 and 100kb, showing similar results.

      Furthermore, we show below in a new figure that the results are comparable when we measure readthrough in the 0 to 10kb region, while the sample size of readthrough regions is increased.

      Finally, it is commonly accepted to remove readthrough regions overlapping genes, which while reducing sample size, increases accuracy for readthrough determination (Rosa-Mercado et al. 2021). Without filtering readthrough regions can overlap neighboring genes which is reflected in an elevated ratio of Readthrough_counts/Genic_counts (Fig. S9).

      Author response image 7.

      A) Readthrough was determined in a region 0 to 10 kb downstream of genes for a subset of genes that were at least 10 kb away from the nearest neighboring gene (n=684 regions). The log2 ratio of readthrough to gene expression is plotted across five age groups (adolescent n=32, young n=31, middle-aged n=22, old n=37 and very old n=21). B) As in (A) but data is plotted on a per sample basis. C) Readthrough was determined in a region 0 to 10 kb downstream of genes for a subset of genes that were at least 10 kb away from the nearest neighboring gene (n=1045 regions). The log2 ratio of readthrough to gene expression is plotted for the groups comprising senescence (n=12) and the non-senescent group (n=6). D) As in (D) but data is plotted on a per sample basis and for additional control datasets (serum-starved, immortalized, intermediate passage and early passage). N=3 per group.

      7) Fig. 5B shows the opposite of the authors claims: in the control samples there are more transposon reads than in the KCl samples.

      Thank you for pointing this out. During preparation of the manuscript the labels of Fig. 5B were switched (however, the color matching between Fig. 5A-C is correct). We apologize for this mistake, which we have now corrected.

      8) "induced readthrough led to preferential expression of gene proximal transposons (i.e. those within 25 kb of genes), when compared with senescence or aging". A convincing analysis would show if there is indeed preferential proximity of induced transposons to TSEs. Since readthrough transcription decays as a function of distance from TSEs, the expression of transposons should show the same trends if indeed simply caused by readthrough. Also, these should be compared to the extent of transposon expression (not induction) in intergenic regions without any readthrough, in these conditions.

      This is a very good suggestion. We now provide two new supplementary figures analyzing the distance-dependence of transposon expression.

      In the first figure (Fig. S13) we show that readthrough decreases with distance (A, B) and we show that transposon counts are higher for transposons close to genes, following a similar pattern to readthrough. This is true in fibroblasts isolated from aged donors (A) and with cellular senescence (B).

      Author response image 8.

      Readthrough counts (rt_counts) decrease exponentially downstream of genes, both in the aging dataset (A) and in the cellular senescence dataset (B). Although noisier, the pattern for transposon counts (transp_cum_counts) is similar with higher counts closer to gene terminals, both in the aging dataset (C) and in the cellular senescence dataset (D). Readthrough counts are the cumulative counts across all genes and samples. Readthrough was determined in 10 kb bins and the values are assigned to the midpoint of the bin for easier plotting. Transposon counts are the cumulative counts across all samples for each transposon that did not overlap a neighboring gene. n=801 in (C) and n=3479 in (D).

      In the second figure (Fig. S14) we show that transposons found downstream of genes with high readthrough show a more pronounced log-fold change (differential expression) than transposons downstream of genes with low readthrough (defined based on log-fold change). This is true in fibroblasts isolated from aged donors (A) and with cellular senescence (B). Furthermore, the difference between high and low readthrough region transposons is diminished for transposons that are more than 10 kb downstream of genes, as would be expected given that readthrough decreases with distance.

      Author response image 9.

      Transposons found downstream of genes with high readthrough (hi_RT) show a more pronounced log-fold change (transp_logfc) than transposons downstream of genes with low readthrough (low_RT). This is true in fibroblasts isolated from aged donors (A) and with cellular senescence (B). Furthermore, the difference between high and low readthrough region transposons is diminished for transposons that are more than 10 kb downstream of genes (“Transp > 10 kb”). Transposons in high readthrough regions were defined as those in the top 20% of readthrough log-fold change. Readthrough was measured between 0 and 10 kb downstream from genes. n=2124 transposons in (A) and n=6061 transposons in (B) included in the analysis.

      Reviewer #2 (Public Review):

      In this manuscript, the authors examined the role of transcription readout and intron retention in increasing transcription of transposable elements during aging in mammals. It is assumed that most transposable elements have lost the regulatory elements necessary for transcription activation. Using available RNA-seq datasets, the authors showed that an increase in intron retention and readthrough transcription during aging contributes to an increase in the number of transcripts containing transposable elements.

      Previously, it was assumed that the activation of transposable elements during aging is a consequence of a gradual imbalance of transcriptional repression and a decrease in the functionality of heterochromatin (de repression of transcription in heterochromatin). Therefore, this is an interesting study with important novel conclusion. However, there are many questions about bioinformatics analysis and the results obtained.

      Major comments:

      1) In Introduction the authors indicated that only small fraction of LINE-1 and SINE elements are expressed from functional promoters and most of LINE-1 are co-expressed with neighboring transcriptional units. What about other classes of mobile elements (LTR mobile element and transposons)?

      We thank the reviewer for this comment. Historically, most repetitive elements, e.g. DNA elements and retrotransposon-like elements, have been considered inactive, having accrued mutations which prevent them from transposition. On the other hand, based on recent data it is indeed very possible that certain LTR elements become active with aging as suggested in several manuscripts (Liu et al. 2023, Autio et al. 2020). However, these elements are not well annotated and our final analysis (Fig. 6) relies on a well-defined distinction between active and inactive elements. (See also question 2 for further discussion.)

      Finally, we would like to point out some of the difficulties with defining expression and re-activation of LTR/ERV elements based on RNAseq data that have been highlighted for the Liu manuscript and are concordant with several of our results: https://pubpeer.com/publications/364E785636ADF94732A977604E0256

      Liu, Xiaoqian, et al. "Resurrection of endogenous retroviruses during aging reinforces senescence." Cell 186.2 (2023): 287-304.

      Autio A, Nevalainen T, Mishra BH, Jylhä M, Flinck H, Hurme M. Effect of ageing on the transcriptomic changes associated with expression at the HERV-K (HML-2) provirus at 1q22. Immun Ageing. 2020;17(1):11.

      2) Results: Why authors considered all classes of mobile elements together? It is likely that most of the LTR containing mobile elements and transposons contain active promoters that are repressed in heterochromatin or by KRAB-C2H2 proteins.

      We do not consider LTR containing elements because there is uncertainty regarding their overall expression levels and their expression with aging (Nevalainen et al. 2018). Furthermore, we believe that substantial activity of LTR elements in human genomes should have been detectable through patterns of insertional mutagenesis. Yet studies generally show low to negligible levels of LTR (ERV) mutagenesis. Here, for example, at a 200-fold lower rate than for LINEs (Lee et al. 2012).

      Importantly, our analysis in Fig. 6 relies on well-annotated elements like LINEs, which is why we do not include LTR or SINE elements that could be potentially expressed. However, for other analyses we did consider element families independently as can be seen in Table S1, for example.

      Nevalainen, Tapio, et al. "Aging-associated patterns in the expression of human endogenous retroviruses." PLoS One 13.12 (2018): e0207407.

      Lee, Eunjung, et al. "Landscape of somatic retrotransposition in human cancers." Science 337.6097 (2012): 967-971.

      3) Fig. 2. A schematic model of transposon expression is not presented clearly. What is the purpose of showing three identical spliced transcripts?

      This is indeed confusing. There are three spliced transcripts to schematically indicate that the majority of transcripts will be correctly spliced and that intron retention is rare (estimated at 4% of all reads in our dataset). We have clarified the figure now, please see below:

      Author response image 10.

      A schematic model of transposon expression. In our model, represented in this schematic, transcription (A) can give rise to mRNAs and pre-mRNAs that contain retained introns when co-transcriptional splicing is impaired. This is often seen during aging and senescence, and these can contain transposon sequences (B). In addition, transcription can give rise to mRNAs and pre-mRNAs that contain transposon sequences towards the 3’-end of the mRNA when co-transcriptional termination at the polyadenylation signal (PAS) is impaired (C, D) as seen with aging and senescence. Some of these RNAs may be successfully polyadenylated (as depicted here) whereas others will be subject to nonsense mediated decay. Image created with Biorender.

      4) The study analyzed the levels of RNA from cell cultures of human fibroblasts of different ages. The annotation to the dataset indicated that the cells were cultured and maintained. (The cells were cultured in high-glucose (4.5mg/ml) DMEM (Gibco) supplemented with 15% (vol/vol) fetal bovine serum (Gibco), 1X glutamax (Gibco), 1X non-essential amino acids (Gibco) and 1% (vol/vol) penicillin-streptomycin (Gibco). How correct that gene expression levels in cell cultures are the same as in body cells? In cell cultures, transcription is optimized for efficient division and is very different from that of cells in the body. In order to correlate a result on cells with an organism, there must be rigorous evidence that the transcriptomes match.

      We agree and have updated the discussion to reflect this shortcoming. While we do not have human tissue data, we would like to draw the reviewer’s attention to Fig. S3 where we presented some liver data for mice. We now provide an additional supplementary figure (in a style similar to Fig. S2) showing how readthrough, transposon expression and intron retention changes in 26 vs 5-month-old mice (Fig. S4). Indeed, intron, readthrough and transposons increase with age in mice, although this is more pronounced for transposons and readthrough.

      Author response image 11.

      Intron, readthrough and transposon elements are elevated in the liver of aging mice (26 vs 5-month-old, n=6 per group). Readthrough and transposon expression is especially elevated even when compered to genic transcripts. The percentage of upregulated transcripts is indicated above each violin plot and the median log10-fold change for genic transcripts is indicated with a dashed red line.

      Finally, just to elaborate, we used the aging fibroblast dataset by Fleischer et al. for three reasons:

      1) Yes, aging fibroblasts could be a model of human aging, with important caveats as you correctly point out,

      2) it is one of the largest such datasets allowing us to draw conclusions with higher statistical confidence and do things such as partial correlations

      3) it has been analyzed using similar techniques before (LaRocca, Cavalier and Wahl 2020) and this dataset is often used to make strong statements about transposons and aging such as transposon expression in this dataset being “consistent with growing evidence that [repetitive element] transcripts contribute directly to aging and disease”. Our goal was to put these statements into perspective and to provide a more nuanced interpretation.

      LaRocca, Thomas J., Alyssa N. Cavalier, and Devin Wahl. "Repetitive elements as a transcriptomic marker of aging: evidence in multiple datasets and models." Aging Cell 19.7 (2020): e13167.

      5) The results obtained for isolated cultures of fibroblasts are transferred to the whole organism, which has not been verified. The conclusions should be more accurate.

      We agree and have updated the discussion accordingly.

      6) The full pipeline with all the configuration files IS NOT available on github (pabisk/aging_transposons).

      Thank you for pointing this out, we have now uploaded the full pipeline and configuration files.

      7) Analysis of transcripts passing through repeating regions is a complex matter. There is always a high probability of incorrect mapping of multi-reads to the genome. Things worsen if unpaired short reads are used, as in the study (L=51). Therefore, the authors used the Expectation maximization algorithm to quantify transposon reads. Such an option is possible. But it is necessary to indicate how statistically reliable the calculated levels are. It would be nice to make a similar comparison of TE levels using only unique reads. The density of reads would drop, but in this case it would be possible to avoid the artifacts of the EM algorithm.

      We thank the reviewer for this suggestion. We show here that mapping only unique alignments (outFilterMultimapNmax=1 in STAR) leads to similar results.

      For the aging fibroblast dataset:

      Author response image 12.

      For the induced senescence dataset:

      Author response image 13.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      Receptor tyrosine kinases such as ALK play critical roles during appropriate development and behaviour and are nodal in many disease conditions, through molecular mechanisms that weren't completely understood. This manuscript identifies a previously unknown neuropeptide precursor as a downstream transcriptional target of Alk signalling in Clock neurons in the Drosophila brain. The experiments are well designed with attention to detail, the data are solid and the findings will be useful to those interested in events downstream of signalling by receptor tyrosine kinases.

      Authors response: We thank the reviewers for this assessment of our Manuscript. We are happy to accept the current eLife assessment of our manuscript. In our revised manuscript we have addressed all of the major reviewer comments, including additional experiments suggested by the reviewers, which have significantly strengthened the revised version.

      Reviewer #1 (Public Review):

      Sukumar et al build on a body of work from the Palmer lab that seeks to unravel the transcriptional targets of Alk signaling (a receptor tyrosine kinase). Having uncovered its targets in the mesoderm in an earlier study, they seek to determine its targets in the central nervous system. To do this, they use Targeted DamID (TaDa) in the wild-type and Alk dominant negative background and identify about 1700 genes that might be under the control of Alk signalling. Using their earlier data and applying a set of criteria - upregulated in gain-of-Alk, downregulated in loss-of-Alk, and co-expressed with Alk positive cells in single cell datasets - they arrive upon a single gene, Sparkly, which is predicted to be a neuropeptide precursor.

      They generate antibodies and mutants for Sparkly and determine that it is responsive to Alk signalling and is expressed in many neuroendocrine cells, as well as in clock neurons. Though the mutants survive, they have reduced lifespans and are hyperactive. In summary, the authors identify a previously unidentified transcriptional target of Alk signalling, which is likely cleaved into a neuropeptide and is involved in regulating circadian activity.

      The data support claims made, are generally well presented and the manuscript clearly written. The link between circadian control of Alk signalling in Clock neurons > Spar expression > ultimately controlling circadian activity, however, was not clear.

      Authors response: We thank the reviewer for this through reading of our manuscript and for kindly highlighting the important takeaways from the study. The role of Alk signalling in activity, circadian rhythm and sleep has previously been reported by other groups in the following studies – (Bai and Sehgal, 2015; Weiss et al, 2017; Gouzi, Bouraimi et al 2018), which we have discussed in our manuscript. We also have identified a hyperactivity phenotype in our Alk CNS specific loss-of-function allele, AlkRA, which is similar to the Spar loss-of-function mutant phenotype. We hypothesize that one of ways in which Alk signalling regulates fly activity is through regulating Spar gene expression in neuroendocrine cells. This is supported by our data which shows Alk expression in Clock neurons, as well by the new experimental data showing an activity phenotype in flies expressing Spar RNAi driven by the Clk678-Gal4 driver.

      Reviewer #2 (Public Review):

      This manuscript illustrates the power of "combined" research, incorporating a range of tools, both old and new to answer a question. This thorough approach identifies a novel target in a well-established signalling pathway and characterises a new player in Drosophila CNS development.

      Largely, the experiments are carried out with precision, meeting the aims of the project, and setting new targets for future research in the field. It was particularly refreshing to see the use of multi-omics data integration and Targeted DamID (TaDa) findings to triage scRNA-seq data. Some of the TaDa methodology was unorthodox (and should be justifed/caveats mentioned in the main text), however, this does not affect the main finding of the study.

      Their discovery of Spar as a neuropeptide precursor downstream of Alk is novel, as well as its ability to regulate activity and circadian clock function in the fly. Spar was just one of the downstream factors identified from this study, therefore, the potential impact goes beyond this one Alk downstream effector.

      Authors response: We thank the reviewer for the positive comments highlighting the strengths of our study. TaDa was used as a semi-quantitative readout of the transcriptional activity in a Alk loss-of-function background with an emphasis on relative differences in peaks close to GATC sites, providing an important dataset for integration with bulk and single cell RNAseq. As the reviewer points out there are important considerations when interpreting this data and we have now added sentences in the discussion to inform readers of possible caveats of our TaDa dataset.

      Reviewer #3 (Public Review):

      Summary:

      The receptor tyrosine kinase Anaplastic Lymphoma Kinase (ALK) in humans is nervous system expressed and plays an important role as an oncogene. A number of groups have been signalling ALK signalling in flies to gain mechanistic insight into its various role. In flies, ALK plays a critical role in development, particularly embryonic development and axon targeting. In addition, ALK also was also shown to regulate adult functions including sleep and memory. In this manuscript, Sukumar et al., used a suite of molecular techniques to identify downstream targets of ALK signalling. They first used targeted DamID, a technique that involves a DNA methylase to RNA polymerase II, so that GATC sites in close proximity to PolII binding sites are marked. They performed these experiments in wild-type and ALK loss of function mutants (using an Alk dominant negative ALkDN), to identify Alk responsive loci. Comparing these loci with a larval single-cell RNAseq dataset identified neuroendocrine cells as an important site of Alk action. They further combined these TaDa hits with data from RNA seq in Alk Loss and Gain of Function manipulations to identify a single novel target of Alk signalling - a neuropeptide precursor they named Sparkly (Spar) for its expression pattern. They generated a mutant allele of Spar, raised an antibody against Spar, and characterised its expression pattern and mutant behavioural phenotypes including defects in sleep and circadian function.

      Strengths:

      The molecular biology experiments using TaDa and RNAseq were elegant and very convincing. The authors identified a novel gene they named Spar. They also generated a mutant allele of Spar (using CrisprCas technology) and raised an antibody against Spar. These experiments are lovely, and the reagents will be useful to the community. The paper is also well written, and the figures are very nicely laid out making the manuscript a pleasure to read.

      Weaknesses:

      My main concerns were around the genetics and behavioural characterisation which is incomplete. The authors generated a novel allele of Spar - Spar ΔExon1 and examined sleep and circadian phenotypes of this allele. However, they have only one mutant allele of Spar, and it doesn't appear as if this mutant was outcrossed, making it very difficult to rule out off-target effects. To make this data convincing, it would be better if the authors had a second allele, perhaps they could try RNAi?

      Further, the sleep and circadian characterisation could be substantially improved. In Fig 8 E-F it appears as if sleep was averaged over 30 days! This is a little bizarre. They then bin the data as day 1 - 12 and 12-30. This is not terribly helpful either. Sleep in flies, as in humans, undergoes ontogenetic changes - sleep is high in young flies, stabilises between day 3-12, and shows defects by around 3 weeks of age (cf Shaw et al., 2000 PMID 10710313). The standard in the sleep field is to average over 3 days or show one representative day. The authors should reanalyse their data as per this standard, and perhaps show data from 310 day old flies, and if they like from 20-30 day old flies. Further, sleep data is usually analysed and presented from lights on to lights on. This allows one to quantify important metrics of sleep consolidation including bout lengths in day and night, and sleep latency. These metrics are of great interest to the community and should be included.

      The authors also claim there are defects in circadian anticipatory activity. However, these data, as presented are not solid to me. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827). Further, circadian period could also be evaluated. There are several free software packages to perform these analyses so it should not be hard to do.

      Authors response: We thank the reviewer for the thorough reading of our manuscript and for generously praising the positives as well as pointing out the weakness of our study. We have now addressed the highlighted weaknesses in behavioural experiments. In particular, we have reanalysed our data according to the reviewer’s suggestions. In addition, we provide experimental data, driving Spar RNAi in Clock neurons, that support our Spar mutant analysis.

      Point-by-point response to the reviewers’ concerns:

      Point 1. “My main concerns were around the genetics and behavioural characterisation which is incomplete. The authors generated a novel allele of Spar - Spar ΔExon1 and examined sleep and circadian phenotypes of this allele. However, they have only one mutant allele of Spar, and it doesn't appear as if this mutant was outcrossed, making it very difficult to rule out off-target effects. To make this data convincing, it would be better if the authors had a second allele, perhaps they could try RNAi?”

      Authors response: As per the reviewer's suggestion, we conducted a targeted knockdown of Sparkly specifically in clock neurons (Clk-Gal4 > Spar-RNAi) and assessed the circadian phenotypes. Flies were monitored for 5 days in LD followed by a shift to DD, similar to our previous LD-DD experiments. The results revealed a significant disruption in both activity and sleep during the DD transition period upon knockdown of Spar in circadian clock neurons. These findings strongly align with the expression pattern of Spar in clock neurons (Figure 7i-l’’). We have now included a new main figure (Figure 9) together with several supplementary figure (Figure 9 – figure supplements 1 and 2) and discussed these experiments on pages 17-18 of the results section of the revised manuscript.

      Point 2. “Further, the sleep and circadian characterisation could be substantially improved. In Fig 8 E-F it appears as if sleep was averaged over 30 days! This is a little bizarre. They then bin the data as day 1 - 12 and 12-30. This is not terribly helpful either. Sleep in flies, as in humans, undergoes ontogenetic changes - sleep is high in young flies, stabilises between day 3-12, and shows defects by around 3 weeks of age (cf Shaw et al., 2000 PMID 10710313). The standard in the sleep field is to average over 3 days or show one representative day. The authors should reanalyse their data as per this standard, and perhaps show data from 3–10-day old flies, and if they like from 20–30-day old flies.”

      Authors response: We have reanalysed these data according to the reviewer's suggestions and revised the sleep data presented. Specifically, we have focused on two 3-day periods, days 5-7 as well as days 20-22. By averaging the sleep mean during these time points, we observed a significant decrease in average sleep duration in the SparΔExon1 and Alk ΔRA mutant flies at a younger age (Figure 8h-h’, Figure 8 – figure supplement 2). However, no significant effect was observed in older flies (Figure 8h-h’, Figure 8 – figure supplement 2). We have incorporated this new data into Figure 8 and provided a detailed description in the results section (page 16) of the revised manuscript.

      Point 3. “Further, sleep data is usually analysed and presented from lights on to lights on. This allows one to quantify important metrics of sleep consolidation including bout lengths in day and night, and sleep latency. These metrics are of great interest to the community and should be included.”

      Authors response: We have now reanalysed these data as per the reviewer's suggestion. From the raw data collected over a span of 3 days, we specifically selected the lights on-lights on data and examined the average sleep duration. Notably, we observed a significant downregulation of average sleep in SparΔExon1 and AlkΔRA flies, but only at a younger age (Figure 8h-h’, Figure 8 – figure supplement 2). Furthermore, we assessed the number of sleep bouts using this data and found a significant increase in the number of bouts in younger SparΔExon1 and AlkΔRA flies, with no changes observed at an older age (Figure 8 – figure supplement 2). Additionally, we evaluated the number of bouts in flies that were initially monitored in LD and then shifted to DD, observing a significant decrease in the number of sleep bouts in SparΔExon1 flies following the transition to DD (Figure 9d). This new data is described in detail in the results section (pages 16-18) of the revised manuscript.

      Point 4. “The authors also claim there are defects in circadian anticipatory activity. However, these data, as presented are not solid to me. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827).”

      Authors response: We appreciate the valuable suggestion provided by the reviewer. In accordance with the referenced paper by Harrisingh et al. (2007), we calculated the "anticipation score" defined as the percentage of activity in the 6hour period preceding the lights-on or lights-off transition that occurs in the 3-hour window just before the transition. To analyse the mean activity of the flies, we selected the data corresponding to the 6 hours before lights-on and the 6 hours before lights-off, averaged over a 14-day period under normal LD conditions. Interestingly, we observed a significant increase in the mean activity of SparΔExon1 flies during both morning anticipation (a.m. anticipation) and evening anticipation (p.m. anticipation) (Figures 8f). Furthermore, we analysed this parameter for flies entrained in DD and found that SparΔExon1 flies exhibited lower mean activity during both morning and evening anticipation (Figures 8g). We have incorporated this new data into Figure 8 and provided a detailed description in the results section (pages 16-18) of the revised manuscript.

      Point 5. Further, circadian period could also be evaluated. There are several free software packages to perform these analyses so it should not be hard to do.

      Authors response: We have now evaluated the circadian period as suggested by the reviewer; generating a chi-square periodogram for each fly to calculate the free-running period for the flies that were under normal LD conditions additionally to the ones that were entrained in DD. We calculated the percentage of flies that had a shorter or longer period than 1440 min (24 h) and observed that w1118 and SparΔExon1 flies have a longer circadian period (Figure 8 – figure supplement 4) but following the shift to DD, they tend to have a shorter circadian period (Figure 9 – figure supplement 3). This new data is described in the results (pages 16-18).

      Recommendations for the authors:

      There are two major concerns that we recommend the authors address:

      1) The behaviour: There are a number of unconventional representations of the behavioural data in this manuscript. We recommend that the authors revisit their data representation to adhere to conventions in the field - specific suggestions are in the reviews. We also suggest an additional experiment - an RNAi/different allele/rescue experiment to ensure that the phenotypes the authors observe are not due to off-target effects of the mutant they have generated.

      Authors response: In the revised manuscript, we have reanalysed the behavioural data according to the reviewers’ recommendations (included in Figures 8 and 9 of the revised version). In addition, we have performed a targeted Spar RNAi experiment in clock neurons (included in Figure 9 of the revised version), identifying a hyperactive behavioural phenotype similar to that of Spar mutants. The inclusion of these new analyses and data strengthens the manuscript and support the conclusion that Spar plays a role in regulation of behaviour.

      2) TaDa analyses: We were concerned that the authors might be picking up false positives with the way they have analysed their data. While this may not matter for this study, it will be useful to reason out their approach and keep this in mind for any other targets they choose from these data for further studies.

      Authors response: In line with the reviewers concerns we have now highlighted the potential caveats and drawbacks of our TaDa dataset in the discussion section of the revised manuscript (detailed in response to Reviewer #2 below).

      Reviewer #1 (Recommendations For The Authors):

      Though generally well written, I felt that some sections could be written in more detail. For example, the text around Figure 5 was not very informative. Many of the other approaches to the analyses and details of datasets used were glossed over. Since the manuscript uses a lot of previously published data, it would be nice to give more details about them in the context of the results.

      Authors response: We thank the reviewer for this recommendation. We have now added additional information about peptidomics analysis in the results and in the legend of Figure 5. We have also included a table in the Methods that summarised the datasets used in this study, including the Dataset name, brief description and reference.

      In the panels where co-localisations have been represented, it would be nice to include enlarged insets depicting the co-labelling. It is not always obvious in the way the figures have currently been represented. For example, in Fig 2G, Alk stain appears to be everywhere, but the authors make the point that it is enriched in neuroendocrine cells (as labelled by dimmed), but the co-localisation isn't evident. Similar issues come up with the sparkly colocalisations.

      Authors response: As suggested by the reviewer, we have now added additional panels to complement the stainings in Figure 2G. These new data are included as Figure 2 – figure supplement 1 (Alk/Dimm-Gal4>UAS-GFPcaax staining) and as Figure 4 – figure supplement 1 (Alk/Spar staining), which indicate colocalization in the central brain and ventral nerve cord prosecretory cells with enlarged panels.

      Supplementary figures S3C and 3F appear garbled to me? Maybe it didn't upload properly?

      Authors response: Unfortunately, this issue is not apparent to us. However, we have now re-uploaded these Figures.

      Sparkly's responsiveness to Alk signalling: Visually, there does not seem to be an increase or decrease in spar levels in the images in Fig 4F-H. How was the quantification done? I would suggest a more detailed interpretation of their results related to spar's responsiveness to Alk signalling - at the mRNA vs protein levels and the GOF vs LOF conditions.

      Authors response: We thank the reviewer for this constructive recommendation. In the revised manuscript, we have now repeated this experiment with increased numbers of larval CNS followed by blinded image analysis. These results also show an increased fluorescence intensity as measured by corrected total cell fluorescence (CTCF), confirming our previous observation of increased Spar protein expression in in Alk gain-of-function conditions compared to controls. In this analysis, changed in Spar levels in Alk loss-of-function remained non-significant compared to control, in agreement with our previous data. As suggested by the reviewer, we have now included several additional sentences discussing the possible reasons for these observations. This following text is now included on Page 11 of the results section:

      “While our bulk RNA-seq and TaDa datasets show a reduction in Spar transcript levels in Alk loss-of-function conditions, this reduction is not reflected at the protein level. This observation may reflect additional uncharacterised pathways that regulate Spar mRNA levels as well as translation and protein stability. Taken together, these observations confirm that Spar expression is responsive to Alk signaling in CNS, although Alk is not critically required to maintain Spar protein levels.” We have also added an additional Image analysis method section explaining the methodology of the CTCF fluorescent intensity quantification on Page 28.

      Reviewer #2 (Recommendations For The Authors):

      It was surprising to see that the authors did not use Dam-only controls. This is to control for background methylation by Dam (i.e. accessible chromatin). This does not invalidate the main results of the manuscript, however, there could be false positives in the dataset for genes that are seen to be up-regulated in the mutant condition (e.g. if accessibility is increased in the mutant but not transcription, then it would look like increased Pol II binding, when it isn't). As the study was focusing on genes down-regulated in the mutant, this is less of an issue, as it is very unlikely to see an increase in transcription with a decrease in accessibility (that could provide a false positive). The authors should explain their rationale for not using Dam-only controls, and the associated caveats, in the manuscript.

      Authors response: We agree with the reviewer’s comment on possibility of identifying false positive candidates from our TaDa dataset. Especially, if one is seeking to find a gene with increased Pol II occupancy in a Alk dominant negative condition. However, our analysis only focuses on genes which are responsive to Alk-manipulation, namely, genes which are downregulated in the Alk dominant negative condition. One of the rationales for not using a Dam-only control was that in our previous Mendoza-Garcia et al, 2021 study, we employed a similar method and were able to successfully identify already known and novel targets of Alk signalling in embryonic mesoderm comparing the Dam-Pol II versus Dam-Pol II; Alk Dominant negative conditions. In the current version of the manuscript, we have expanded our discussion of these caveats as follows (Discussion, Page 19-20):

      “A potential drawback of our TaDa dataset is the identification of false positives, due to non-specific methylation of GATC sites at accessible regions in the genome by Dam protein. Hence, our experimental approach likely more reliably identifies candidates which are downregulated upon Alk inhibition. In our analysis, we have limited this drawback by focusing on genes downregulated upon Alk inhibition and integrating our analysis with additional datasets, followed by experimental validation. This approach is supported by the identification of numerous previously iden- tied Alk targets in our TaDa candidate list.”

      Related to this, could the authors make it clear/justify why they chose to use peakbased analysis of the Dam-Pol II data rather than looking at signals across whole transcripts? For example, this could result in false positives if a gene switches from having no Pol II to having paused Pol II.

      Authors response: In our opinion, a peak based analysis is dependable in this context. We chose to prioritize peaks close (+/- 1kb) to transcription start sites (TSS) to increase the chances of finding true Pol II occupancy peaks. Also, during bioinformatics analysis using Damid-seq pipeline (Maksimov et al, 2016) fragments not aligning to GATC borders are excluded. Therefore, a whole transcript Pol II occupancy peak analysis may not be always feasible. We agree with the reviewer that a paused Pol II will result in false positives, however, it will only result in an increase of a specific peak and in our case, we are seeking to identify peaks with lower pol II occupancy as a result of Alk knockdown. Furthermore, we depend on additional integration with additional relevant datasets to minimise false positive candidates for detailed analysis. In the current version of the manuscript these caveats have been mentioned and discussed (see point above).

      Do the authors have any theories about the mode of action of Spar? Or ideas about how this might be followed up? If so, that could be included in the Discussion.

      Authors response: Other than identifying modified Spar derived peptides, which suggest a target receptor, possibly a GPCR, were have no other data currently that allows us to speculate more on the mode of action of Spar. We are currently working hard to try to identify a receptor, but this is a challenging and ongoing process. In the discussion we speculate regarding the identity of the Spar receptor, as well as its location, which is likely in the CNS, and body muscle, however, these are open questions that we can hopefully answer in a future study.

      Reviewer #3 (Recommendations For The Authors):

      Spar protein expression was unchanged in Alk loss of function. This is a curious result as the authors used RNA seq data from Alk loss of function to identify Spar. This could be commented on in the discussion.

      Authors response: We thank the reviewer for this comment, and they are correct in noticing this. We have also thought about this, and reviewer #1 also commented. To confirm this result, we repeated this experiment with increased numbers of larval CNS followed by blinded image analysis for the revised version. These results also show an increased fluorescence intensity as measured by corrected total cell fluorescence (CTCF), confirming our previous observation of increased Spar protein expression in in Alk gain-of-function conditions compared to controls. In this analysis, changed in Spar levels in Alk loss-of-function remained non-significant compared to control, in agreement with our previous data. As suggested by reviewer #1, we have now included several additional sentences discussing the possible reasons for these observations. This following text is now included on Page 11 of the results section:

      “While our bulk RNA-seq and TaDa datasets show a reduction in Spar transcript levels in Alk loss-of-function conditions, this reduction is not reflected at the protein level. This observation may reflect additional uncharacterised pathways that regulate Spar mRNA levels as well as translation and protein stability. Taken together, these observations confirm that Spar expression is responsive to Alk signaling in CNS, although Alk is not critically required to maintain Spar protein levels.”

      Pg 19: Spar is expressed in the Mushroom Bodies (MBs). Do they mean in Kenyon Cells (KCs)? I don't see this expression in the figures. Maybe this could be highlighted in the figure. It would definitely be of interest if this were true.

      Authors response: We agree with the reviewer that this would be interesting. We have not performed detailed staining of the mushroom bodies at this point, however, Spar mRNA expression in a transcriptomics analysis performed by Crocker et al, 2016, identifies Spar in all cell types, including Kenyon cells. We have now included this and cited this reference in the discussion.

      Spar is also expressed in multiple potential sleep regulatory sites including clock neurons, the PI, AstA cells and so on. Some of these might be arousal-promoting and some sleep-promoting. Taking out Spar in both sleep and arousal-promoting subsets might have complex effects. The authors might want to knock down Alk in different subsets of neurons to make more targeted manipulations.

      Authors response: We thank the reviewer for this suggestion regarding interesting experiments to further investigate Spar function. We are planning to follow up and study the role of Alk signalling in different neuronal subsets, with a specific interest in neuroendocrine/prosecretory cells.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer No.1 (public)

      The authors present a study focused on addressing the key challenge in drug discovery, which is the optimization of absorption and affinity properties of small molecules through in silico methods. They propose active learning as a strategy for optimizing these properties and describe the development of two novel active learning batch selection methods. The methods are tested on various public datasets with different optimization goals and sizes, and new affinity datasets are curated to provide up-todate experimental information. The authors claim that their active learning methods outperform existing batch selection methods, potentially reducing the number of experiments required to achieve the same model performance. They also emphasize the general applicability of their methods, including compatibility with popular packages like DeepChem.

      Strengths:

      Relevance and Importance: The study addresses a significant challenge in the field of drug discovery, highlighting the importance of optimizing the absorption and affinity properties of small molecules through in silico methods. This topic is of great interest to researchers and pharmaceutical industries.

      Novelty: The development of two novel active learning batch selection methods is a commendable contribution. The study also adds value by curating new affinity datasets that provide chronological information on state-of-the-art experimental strategies.

      Comprehensive Evaluation: Testing the proposed methods on multiple public datasets with varying optimization goals and sizes enhances the credibility and generalizability of the findings. The focus on comparing the performance of the new methods against existing batch selection methods further strengthens the evaluation.

      Weaknesses:

      Lack of Technical Details: The feedback lacks specific technical details regarding the developed active learning batch selection methods. Information such as the underlying algorithms, implementation specifics, and key design choices should be provided to enable readers to understand and evaluate the methods thoroughly.

      Evaluation Metrics: The feedback does not mention the specific evaluation metrics used to assess the performance of the proposed methods. The authors should clarify the criteria employed to compare their methods against existing batch selection methods and demonstrate the statistical significance of the observed improvements.

      Reproducibility: While the authors claim that their methods can be used with any package, including DeepChem, no mention is made of providing the necessary code or resources to reproduce the experiments. Including code repositories or detailed instructions would enhance the reproducibility and practical utility of the study.

      Suggestion 1:

      Elaborate on the Methodology: Provide an in-depth explanation of the two active learning batch selection methods, including algorithmic details, implementation considerations, and any specific assumptions made. This will enable readers to better comprehend and evaluate the proposed techniques.

      Answer: We thank the reviewer for this suggestion. Following this comments we have extended the text in Methods (in Section: Batch selection via determinant maximization and Section: Approximation of the posterior distribution) and in Supporting Methods (Section: Toy example). We have also included the pseudo code for the Batch optimization method.

      Suggestion 2:

      Clarify Evaluation Metrics: Clearly specify the evaluation metrics employed in the study to measure the performance of the active learning methods. Additionally, conduct statistical tests to establish the significance of the improvements observed over existing batch selection methods.

      Answer: Following this comment we added to Table 1 details about the way we computed the cutoff times for the different methods. We also provide more details on the statistics we performed to determine the significance of these differences.

      Suggestion 3:

      Enhance Reproducibility: To facilitate the reproducibility of the study, consider sharing the code, data, and resources necessary for readers to replicate the experiments. This will allow researchers in the field to validate and build upon your work more effectively.

      Answer: This is something we already included with the original submission. The code is publicly available. In fact, we provide a phyton library, ALIEN (Active Learning in data Exploration) which is published on the Sanofi Github(https://github.com/ Sanofi-Public/Alien). We also provide details on the public data used and expect to provide the internal data as well. We included a small paragraph on code and data availability.

      Reviewer No.2 (public)

      Suggestion 1:

      The authors presented a well-written manuscript describing the comparison of activelearning methods with state-of-art methods for several datasets of pharmaceutical interest. This is a very important topic since active learning is similar to a cyclic drug design campaign such as testing compounds followed by designing new ones which could be used to further tests and a new design cycle and so on. The experimental design is comprehensive and adequate for proposed comparisons. However, I would expect to see a comparison regarding other regression metrics and considering the applicability domain of models which are two essential topics for the drug design modelers community.

      Answer: We want to thank the reviewer for these comments. We provide a detailed response to the specific comments below. 

      Reviewer No.1 (Recommendations For The Authors)

      Recommendation 1:

      The description provided regarding the data collection process and the benchmark datasets used in the study raises some concerns. The comment specifically addresses the use of both private (Sanofi-owned) and public datasets to benchmark the various batch selection methods. Lack of Transparency: The comment lacks transparency regarding the specific sources and origins of the private datasets. It would be crucial to disclose whether these datasets were obtained from external sources or if they were generated internally within Sanofi. Without this information, it becomes difficult to assess the potential biases or conflicts of interest associated with the data.

      Answer: We would like to thank the reviewer for this comment. As mentioned in the paper, the public github page contains links to all the public data and we expect also to the internal Sanofi data. We also now provide more information on the specific experiments that were internally done by Sanofi to collect that data.

      Potential Data Accessibility Issues: The utilization of private datasets, particularly those owned by Sanofi, may raise concerns about data accessibility. The lack of availability of these datasets to the wider scientific community may limit the ability of other researchers to replicate and validate the study’s findings. It is essential to ensure that the data used in research is openly accessible to foster transparency and encourage collaboration.

      Answer: Again, as stated above we expect to release the data collected internally on the github page.

      Limited Information on Dataset Properties: The comment briefly mentions that the benchmark datasets cover properties related to absorption, distribution, pharmacokinetic processes, and affinity of small drug molecules to target proteins. However, it does not provide any specific details about the properties included in the datasets or how they were curated. Providing more comprehensive information about the properties covered and the methods used for curation would enhance the transparency and reliability of the study.

      To address these concerns, it is crucial for the authors to provide more detailed information about the data sources, dataset composition, representativeness, and curation methods employed. Transparency and accessibility of data are fundamental principles in scientific research, and addressing these issues will strengthen the credibility and impact of the study.

      Answer: We agree with this comment and believe that it is important to be explicit about each of the datasets and to provide information on the new data. We note that we already discuss the details of each of the experiments in Methods and, of course, provide links to the original papers for the public data. We have now added text to Supporting Methods that describes the experiments in more details as well as providing literature references for the experimental protocols used. As noted above, we expect to provide our new internal data on the public git page. 

      Recommendation 2:

      Some comments on the modeling example Approximation of the posterior distribution. Lack of Methodological Transparency: The comment fails to provide any information regarding the specific method or approach used for approximating the posterior distribution. Without understanding the methodology employed, it is impossible to evaluate the quality or rigor of the approximation. This lack of transparency undermines the credibility of the study.

      Answer: We want to thank the reviewer for pointing this out. Based on this comment we added more information to Section: Approximation of the posterior distribution. Moreover, we now provide details on the posterior approximation in Section: Two approximations for computing the epistemic covariance.

      Questionable Assumptions: The comment does not mention any of the assumptions made during the approximation process. The validity of any approximation heavily depends on the underlying assumptions, and their omission suggests a lack of thorough analysis. Failing to acknowledge these assumptions leaves room for doubt regarding the accuracy and relevance of the approximation.

      Answer: We are not entirely sure which assumptions the reviewer is referring to here. The main assumption we can think of that we have used is the fact that getting within X% of the optimal model is a good enough approximation. We have specifically discussed this assumption and tested multiple values of X. While it would have been great to have X = 0 this is unrealistic for retrospective studies. For Active Learning the main question is how many experiments can be saved to obtain similar results and the assumptions we used are basically ’what is the definition of similar’. We now added this to Discussion.

      Inadequate Validation: There is no mention of any validation measures or techniques used to assess the accuracy and reliability of the approximated posterior distribution. Without proper validation, it is impossible to determine whether the approximation provides a reasonable representation of the true posterior. The absence of validation raises concerns about the potential biases or errors introduced by the approximation process.

      Answer: We sincerely appreciate your concern regarding the validation of the approximated posterior distribution. We acknowledge that our initial submission might not have clearly highlighted our validation strategy. It is, of course, very hard to determine the accuracy of the distribution our model learns since such distribution cannot be directly inferred using experiments (no ’ground truth’). Instead, we use an indirect method to determine the accuracy. Specifically, we conducted retrospective experiment using the learned distribution. In these experiments, we indirectly validated our approximation by measuring the error with the respective method. The results from these retrospective experiments provided evidence for the accuracy and reliability of our approximation in representing the true posterior distribution. We now emphasize this in Methods.

      Uncertainty Quantification: The comment does not discuss the quantification of uncertainty associated with the approximated posterior distribution. Properly characterizing the uncertainty is crucial in statistical inference and decision-making. Neglecting this aspect undermines the usefulness and applicability of the approximation results.

      Answer: Thank you for pointing out the importance of characterizing uncertainty in statistical inference and decision-making, a sentiment with which we wholeheartedly agree. In our work, we have indeed addressed the quantification of uncertainty associated with the approximated posterior distribution. Specifically, we utilized Monte Carlo Dropout (MC Dropout) as our method of choice. MC Dropout is a widely recognized and employed technique in the neural networks domain to approximate the posterior distribution, and it offers an efficient way to estimate model uncertainty without requiring any changes to the existing network architecture [1, 2]. In the revised version, we provide a more detailed discussion on the use of Monte Carlo Dropout in our methodology and its implications for characterizing uncertainty.

      Comparison with Gold Standard: There is no mention of comparing the approximated posterior distribution with a gold standard or benchmark. Failing to provide such a comparison leaves doubts about the performance and accuracy of the approximation method. A lack of benchmarking makes it difficult to ascertain the superiority or inferiority of the approximation technique employed.

      Answer: As noted above, it is impossible to find gold standard information for the uncertainly distribution. It is not even clear to us how such gold standard can be experimentally determined since its a function of a specific model and data. If the reviewer is aware of such gold standard we would be happy to test it. Instead, in our study, we opted to benchmark our results against state-of-the-art batch active learning methods, which also rely on uncertainty prediction (such uncertainty prediction is the heart of any active learning method as we discuss). Results clearly indicate that our method outperforms prior methods though we agree that this is only an indirect way to validate the uncertainty approximation.

      Reviewer No.2 (Recommendations For The Authors)

      Recommendation 1:

      The text is kind of messy: there are two results sections, for example. It seems that part of the text was duplicated. Please correct it.

      Answer: We want to thank the reviewer pointing this out. These were typos and we fixed them accordingly.

      Recommendation 2:

      Text in figures is very small and difficult to read. Please redraw the figures, increasing the font size: 10-12pt is ideal in comparison with the main text.

      Answer: We want to thank the reviewer for this comment and we have made the graphics larger.

      Recommendation 3: Please, include specific links to data availability instead of just stating it is available at the Sanofi-Public repository.

      Answer: We want to thank the reviewer for this comment and added the links and data to the Sanofi Github page listed in the paper.

      Recommendation 4:

      What are the descriptors used to train the models?

      Answer: We represented the molecules as molecular graphs using the MolGraphConvFeaturizer from the DeepChem library. We now explicitly mention this in Methods.

      Recommendation 5:

      Regarding the quality of the models, I strongly suggest two approaches instead of using only RMSE as metrics of models’ performance. I recommend using the most metrics as possible as reported by Gramatica (https://doi.org/10.1021/acs.jcim.6b00088). I also recommend somehow comparing the increment on the dataset diversity according to the employed descriptors (applicability domain) as a measurement to further applications on the unseen molecules.

      Answer: We want to thank the reviewer for this great suggestions. As suggested we added new comparison metrics to the Supplement.

      • Distribution plot for the range of the Y values Figure 8 • Clustering of the data sets represented as fingerprints Supplementary material Figure 5,6

      • Retrospective experiments with Spearman correlation coefficient. Supplementary material Figure: 2,3,4

      I suggest also a better characterization of datasets including the nature and range of the Y variable, the source of data in terms of experimentation, and chemical (structural and physicochemical) comparison of samples within each dataset.

      Answer: As noted above in response to a similar comment by Reviewer 1, we have added more detailed information about the different experiments we tested to Supporting Methods.

      References

      [1] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR.

      [2] N.D. Lawrence. Variational Inference in Probabilistic Models. University of Cambridge, 2001.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a very well written and performed study describing a TOPBP1 separation of function mutation, resulting in defective MSCI maintenance but normal sex body formation. The phenotype differs from that of a previous TOPBP1 null allele, in which both MSCI and sex body formation were defective. Additional defects in CHK phosphorylation and SETX localization are also described.

      Strengths:

      The study is very rigorous, with a remarkably large number of MSCI marks assayed, phosphoproteomics (leading to the interesting SETX discovery) and 10X RNAseq, allowing the MSCI phenotype to be further deconvolved. The approaches in most cases are robust.

      Weaknesses:

      There aren't many; please find list below:

      1) The authors are committed to the idea that maintenance of MSCI is the major defect here. However, based on the data, an alternative would be that some cells achieve sex body formation and MSCI normally, while others do not. It would only take a small percentage of cells exhibiting MSCI failure to kill all the cells in the same germinal epithelium, so this could still explain the complete pachytene block. This isn't a major point...this phenotype is clearly different to the TOPBP1 KO, but a broader discussion of possibilities in the discussion would help. I raise this in the context of both the cytology and 10X analysis:

      a) The assessment that sex body formation is normal is based on cytology in Supp 8 and 9, but a more rigorous approach would be to assess condensation of the XY pair in stage-matched spread cells (maybe they have that data already) by measuring distances between the X and Y centromere, or looking at stage IV of the seminiferous cycle, where all cells should have oval sex bodies but sex body mutants have persistent elongated XY pairs (see work of Namekawa and Turner). The authors do actually mention that gH2AX spreading is defective in many cells....and if this is true, condensation to form a sex body would almost certainly not have taken place in those cells.

      We appreciate the reviewer’s comment and have performed the experiment suggested, counting the number of elongated sex bodies in all sex body-positive cells in seminiferous tubules stained with γH2AX and DAPI (as done by Turner in Hirota et al., 2018). The experiment did not show significant differences between Topbp1+/+ and Topbp1B5/B5 as shown in Author response image 1.

      Author response image 1.

      Topbp1B5/B5 displays normal condensation of the XY-pair. A) Immunostaining of XY condensation in Topbp1+/+ and Topbp1B5/B5 testes sections (γH2AX: green and DAPI: gray). B) Quantification of all sex body-positive cells per tubule (Topbp1+/+ number of cells counted = 781, number of tubules counted = 28, number of mice = 3; Topbp1B5/B5 number of cells counted = 967, number of tubules counted = 28, number of mice = 3). C) Quantification of elongated-sex body cells per tubule (Topbp1+/+ number of cells counted = 19 and 762 normal round/oval-sex bodies cells, number of tubules counted = 28, number of mice = 3; Topbp1B5/B5 number of cells counted = 45 and 922 normal round/oval-sex bodies cells, number of tubules counted = 28, number of mice = 3).

      b) Regarding the 10X data, the finding that expression of some XY genes is elevated and others are not is also consistent with a "partial" phenotype (some cells have normal XY bodies and MSCI, others fail in both). In Fig 6E, X expression looks to be elevated in B5 vs wt at all stages...if this were a maintenance issue, shouldn't it be equal to that in wt and then elevate later?

      We understand the point raised by the reviewer, however we do not favor the “partial” phenotype model because of the absence of any post-pachytene spermatocytes in the B5 mutant. If some cells had escaped the MSCI defect, we would expect to detect cells progressing further in meiosis. Because we cannot rule out completely the possibility of a subtle disruption in XY silencing initiation, we decided to better emphasize this point in the discussion (lines 391-394).

      In Figure 6E, the X-linked genes were normalized against chromosome 9-linked genes. The normalization against pre-leptotene was done for the results displayed on Figure 7, in which we demonstrate the maintenance issue. Furthermore, for the 10X analysis, while the same number of cells were loaded for wild-type and mutant, the composition of cells varied between these two samples. Despite the fact that very few “spermatocyte 3” cells were detected in the mutant, those cells displayed much higher X-linked gene expression than the wild-type spermatocyte 3 cells.

      2) How is the quantitation showing impaired localization of select markers (e.g. SETX) normalized? How do we know that the antibody staining simply didn't work as well on the mutant slides?

      The quantification showing impaired localization of the selected markers such as SETX was done as described by Sims, et al. 2022 and Adams, et al. 2018. In brief, the green signal was measured along (XY cores) or across (XY DNA loops) the X and Y chromosomes and normalized against the analogous signal on the autosomal chromosomes. The possibility that the antibody simply did not work as well on the mutant is unlikely since multiple biological replicates were performed and we reproducibly followed standard practices in the field for meiotic spreads staining, imaging, and quantification. We also note that our findings published in Sims et al, 2022 show that ATR inhibition strongly impairs SETX localization to the sex body, further substantiating our claim that signaling via ATR-TOPBP1 controls SETX.

      3) Is testis TOPBP1 protein expression reduced in the B5 mutant?

      TOPBP1 protein abundance in the B5 mutant is reduced in lysates from whole testis, measured via western blot. We did not detect a significant reduction in TOPBP1 signal intensity measured by immunofluorescence in pachytene spreads of the B5 mutant.

      4) 10X analysis: how were the genes on the y-axis in Supp 24 arranged? Is this by location on the X chromosome?

      These genes were sorted by location across the chromosome X.

      5) The final analyses in Fig 7: X-genes are subdivided based on their behavior (up, down, unchanged). What isn't clear to me is whether the authors have considered the fact that there are global changes in gene expression during meiosis (very low in lep , zyg and early pach, then ramps up hugely from mid pach). In other words, is this normalized to autosomal gene expression?

      For the final analysis in Fig7, the normalization was done by their expression at the pre-leptotene stage. Moreover, the analysis was made comparing X-linked gene behavior in Wild-type vs B5 mutant.

      6) Again regarding the 10X analysis, my prediction would be that not ALL X and Y gene would increase in pach if MSCI were ablated...we should remember that XY genes have been subject to MSCI for some 160 million years of evolution, and this will mean that many enhancers that originally drove their expression prior to the evolution of MSCI will now be lost. This has been our experience: many XY genes aren't elevated at pach even in mutants in which MSCI is totally defective. I'd urge the authors to consider this possibility when they use XY gene expression patterns to diagnose the severity or timing of the MSCI phenotype. This could be a discussion point.

      We greatly appreciate the reviewer’s suggestion and have added discussion about this point to lines 392400).

      Reviewer #2 (Public Review):

      Summary:

      This paper described the role of BRCT repeat 5 in TOPBP1, a DNA damage response protein, in the maintenance of meiotic sex chromosome inactivation (MSCI). By analyzing a Topbp1 mutant mouse with amino acid substitutions in BRCT repeat 5, the authors found reduced phosphorylation of a DNA/RNA helicase, Sentaxin, and decreased localization of the protein to the X-Y sex body in pachynema. Moreover, the authors also found decreased repression of several genes on the sex chromosomes in the male mice.

      Strengths:

      The works including phospho-proteomics and single-cell RNA sequencing with lots of data have been done with great care and most of the results are convincing.

      Weaknesses:

      One concern is that, although the Topbp1 mutant spermatocytes show very severe defects after the stage of late pachynema, the defect in the gene silencing in the sex body is relatively weak. It is a bit difficult to explain how such a weak mis regulation of the gene silencing in mice causes the complete loss of cells in the late stage of spermatogenesis.

      We appreciate the reviewer’s comment. We note that even subtle mis-regulation of XY gene silencing has been reported to lead to significant loss of cells in late stage of prophase I (Ichijima et al., 2011; Modzelewski et al., 2012). Moreover, it is possible that some cells with drastic changes in X-gene expression were excluded from the downstream analysis due to high levels of mitochondrial gene expression (cells that were likely dying due to apoptosis). The exclusion of cells with high levels of mitochondrial gene expression is a common practice in downstream analysis of sc-RNA sequencing data.

      Reviewer #3 (Public Review):

      The work presented by Ascencao and coworkers aims to deepen into the process of sex chromosome inactivation during meiosis (MSCI) as a critical factor in the regulation of meiosis progression in male mammals. For this purpose, they have generated a transgenic mouse model in which a specific domain of TOPBP1 protein has been mutated, hampering the binding of a number of protein partners and interfering with the regulatory cascade initiated by ATR. Through the use of immunolocalization of an impressive number of markers of MSCI, phosphoproteomics and single cell RNA sequencing (scRNAseq), the authors are able to show that despite a proper morphological formation of the sex body and the incorporation of most canonical MSCI makers, sex chromosome-liked genes are reactivated at some point during pachytene and this triggers meiosis progression breakdown, likely due to a defective phosphorylation of the helicase SETX.

      The manuscript presents a clear advance in the understanding of MSCI and meiosis progression with two main strengths. First, the generation of a mouse model with a very uncommon phenotype. Second, the use of a vast methodological approach. The results are well presented and illustrated. Nevertheless, the discussion could be still a bit tuned by the inclusion of some ideas, and perhaps speculations, that have not been considered.

      We appreciate the reviewer’s comment and have improved the discussion section addressing the points raised in the “recommendation For the Authors”.

      Reviewer #1 (Recommendations For The Authors):

      I don't have any additional points here

      Reviewer #2 (Recommendations For The Authors):

      The paper by Ascencao et al. describes a separation-in-function allele of TOPBP1 critical for DNA damage response (DDR) that confers a specific defect in XY sex chromosome inactivation during male mouse meiosis. The authors constructed a Topbp1 separation-of-function mouse by introducing amino acid substitutions in BRCT repeat 5 and found the mice with normal DDR response in mitosis and meiosis show male infertility. Topbp1(B5/B5) mice do not contain spermatocytes after diplonema, as a result, little spermatids/sperms. In the mice, most of the meiotic events in prophase I including chromosome synapsis and meiotic recombination as well as the formation of the sex body are normal. The detailed proteomic analysis revealed the reduced ATR-dependent phosphorylation of a DNA/RNA helicase, Sentaxin. And also single-cell RNA sequencing found that the expression of some of genes from sex chromosomes are not silenced well compared to the control. The works with lots of data have been done with great care and most of the results are convincing. One clear concern is that, although the authors nicely showed a defect in gene silencing in sex chromosomes in the Topbp1(B5/B5) mice, how a small defect in the gene silencing leads to the complete loss of diplotene spermatocytes remains unaddressed.

      Major points:

      Although the authors showed a change in the transcriptome in spermatocytes of Topbp1(B5/B5) male mice, the authors cannot explain the complete lack of spermatids in this mouse. Even the transcriptome seems not to provide a clue.

      1) Given that the TOPBP1-B5 protein cannot bind to both 53BP1 and BLM, it is interesting to check the localization of both proteins on meiotic chromosome spreads (in the case of 53BP1, the localization in MEFs with DNA damage).

      We appreciate the reviewer’s comment. We have tried to stain BLM in meiotic spreads using several different antibodies, however we were not successful getting specific signals for BLM. In the case of 53BP1, we monitored its localization, and it was not significantly different from Topbp1-/- meiotic spreads, please refer to Supplemental Figure 11. While we appreciate the reviewer’s suggestion of looking at the localization of 53BP1 in MEFs with DNA damage, we opted not to perform the experiment because we have shown that 53BP1 can still bind the BRCT 1 and 2 domains of TOPBP1 as previously described (Bigot et al., 2019; Cescutti et al., 2010; Liu et al., 2017). Additionally, both male and female 53BP1 KO mice are fertile (Ward et al., 2003), thus the partial disruption in binding to 53BP1 that we observed in TOPBP1 B5 mutant is likely not causing the infertility phenotype.

      2) A recent preprint by Fujiwara et al. (doi: https://doi.org/10.1101/2023.04.12.536672) showed the accumulation of R-loops in spermatocyte spreads in Senataxin knockout mice. The authors may check the R-loop on the sex body in Topbp1-B5 mice.

      We thank the reviewer for the suggestion. We have tried several protocols to stain R-loops (including the protocol used in the paper mentioned above) but were not successful.

      3) The authors need to check the protein level (and band shift) of Senataxin in the testis by western blotting analysis.

      We have tried several SETX antibodies, and none worked for western blot analysis.

      4) If possible, the authors can see any protein interaction between TOPBP1 and Senataxin.

      We appreciate the suggestion, and we will investigate this interaction in future work.

      5) The authors need to check the statistics in the paper.

      (1) It is better to show actual P-values in the case of "ns".

      P-values were added to the respective figure legends.

      (2) In focus counting such as Figures 3D, G, H, 4B, D, F, H, 5E, and F (and in Supplemental Figures), please indicate how many spreads were counted in each mouse. Moreover, the distribution of focus numbers and intensity of fluorescence are not parametric (not normal distribution). It is better to use a non-parametric method such as Mann-Whitney's U test.

      We appreciate the reviewer's comment and upon consulting with a Statistician at Cornell Statistical Consulting Unit (CSCU), we were advised to use a linear mixed effect model to take into account the variability in cells within each mouse when comparing mice between groups (Topbp1+/+ vs Topbp1B5/B5). We then reanalyzed all quantified meiotic spreads using this mixed effect model, and the p-value, number of mice, and number of cells counted for each group are displayed in the respective figure legends. Upon going through all the quantified meiotic spreads, we realized a minor error in one of the previous data points related to SETX staining in Topbp1+/+ and have fixed it. Using the previous quantification data and the new stats analysis the p-value for cores was 0.5598 and p-value for loops was 0.0273. Now using the correct values and the new stats analysis the p-value for cores is 0.5987 and p-value for loops is 0.0452. The correction did not change the conclusion of this data and is now displayed in the new Figure 5. We also realized a mistake in the ATR quantification when the spreadsheet was moved from excel to Graphpad. Using the previous quantification and the new stats analysis the p-value for cores was 0.2451 and p-value for loops was 0.8933. Now using the correct values and the new stats analysis the p-value for cores is 0.4068 and p-value for loops is 0.9396. The correction did not change the conclusion of this data and is now displayed in the new Figure 4. Moreover, we realized that we used n = 8 (n = number of mice) for MDC1 quantification and n = 2 for pCHK1_S345, instead of n =3 as shown in the preprint version of the manuscript. Corrected values were added to their respective figures and figure legends.

      (3) From Figures 6E, 7B, and 7C, the authors conclude the difference in the expression profile between wild type and Topbp1(B5) spermatocytes. It is better to show P-values for the comparison. Particularly, in Figure 7C, Xiap expression kinetics look similar between wild type and the mutant.

      We have added p-values to figures 6E and 7B and their respective figures or figure legends.<br /> In figure 7C, we now recognize that the Δ could have been misleading as we meant to compare Wild-type SP2 to Wild-type SP3 and Mutant SP2 to SP3; and not comparing Wild-type SP3 to Mutant SP3. Therefore, the Δ was excluded from Figure 7C. For the comparisons between expression levels of SP2 and SP3, it is challenging to calculate p-values for a single gene since these cells have started X-gene silencing and expression values are very low. Meaningful p-values for the comparisons between Wildtype SP3 to Mutant SP3 can be visualized in Figure 7B, where the comparison is based on number of genes instead of expression levels of each gene.

      Minor comments:

      1) Line 34: SPO11 is NOT a nuclease. Just delete it.

      It has been deleted (see line 34).

      2) Line 71, a protein: Is this protein ATR? Is so, please write it. If not, please give the name of the protein.

      In line 71 (now lines 79-80), we refer to TOPBP1-interacting proteins in general since many of these interactions happen through a phosphorylation in the TOPBP1’s interactor. This is the case for BLM, 53BP1, FANCJ, and RAD9. ATR interacts with TOPBP1 through TOPBP1’s AAD domain and this is not a phospho-mediated interaction. We restructured the sentence for clarity.

      3) In the Introduction, the authors often refer to a review by Cimprich and Cortez (2008) in various places. It is better to cite an original paper or the other an appropriate review.

      We have accepted the reviewer’s suggestion and added original papers when appropriate.

      4) Line 143-145: The authors generated eight charge reversal point mutations in the BRCT domain 5 of TOPBP1. If possible, it is helpful to mention the logic to generate these substitutions and also why BRCT domain 5, is not other domains.

      We generated eight charge reversal point mutations to abrogate all possible phospho-dependent interactions and avoid potential residual interactions. We have mutated other BRCT domains as well, which will be published separately.

      5) Line 174 (and Figure 2E): RPA should be either RPA2 or RPA32.

      Corrected (it is RPA2).

      6) Figure 5C-F: Please explain in more detail how the authors quantified the SETX signals. Why the two results are different?

      The quantification was done as described by Sims, et al. 2022, yielding separate data for XY cores and DNA loops. In brief, the green signal was measured along (XY cores) or across (XY DNA loops) the X and Y chromosomes. Signals were normalized by the signal in the autosomal chromosomes.

      Reviewer #3 (Recommendations For The Authors):

      I have no major criticisms, but I include a list of comments and suggestions (some of them conceptual, and disputable) that could help the authors to improve some parts of the manuscript.

      1) Line 52: I realize that the term protein "sequestration" (used in many instances along the manuscript) has been widespread in the literature related to MSCI in the last years. While this might be a cool way to describe the dynamics of proteins accumulating in the sex body, this reviewer considers this term is totally inappropriate. It is confusing and introduces at least to mistakes to the fact of protein accumulation in the sex body. First, it seems to indicate that once trapped in the sex body, proteins are incapable of leaving it, which might be completely wrong (histone replacement refutes this idea). Second, it is suggested that DDR proteins are attracted by the sex body and cannot remain associated to autosomes even if DNA repair has not been completed. This has also been demonstrated to be incorrect (see for example PDMI 19714216). Moreover, DDR proteins can associate de novo to chromosomes if needed, for instance upon DNA damage caused by chemicals or irradiation. Thus, I suggest that the use of "sequestration" should be evaluated more critically, evaluating the misleading ideas that are subjacent to this term. The use of protein "accumulation" is much more objective and descriptive of the real facts.

      We thank the reviewer’s suggestion and have addressed it in lines 52, 97 and 324.

      2) Line 88: Just as a deference to the original ideas, it would be nice to acknowledge that the inactivation of sex chromosomes and the formation of a sex body in mouse meiosis was described more than 50 years ago (PDMI 5833946; 4854664). Likewise, the ideas about the sequential achievement and reinforcement of MSCI during pachytene have been developed during the last 20 years, far before the recent reports cited in the manuscript. Citations to these "old fashion" works would be great.

      We appreciate the reviewer’s suggestion and have addressed it in line 86.

      3) Line 90. Please, take into consideration that such a strong effect on meiosis progression occurs mainly in some knockout mice models and that in many other models (including hybrid mice models from natural populations) autosomal regions can remain unsynapsed and accumulate DDR proteins without impairing meiosis. In other mammalian species, meiosis is even more permissive to these MSUC phenomena.

      We appreciate the reviewer’s suggestion and have addressed it at line 88.

      4) Line 211: The differences in the abundance of MLH1 and MLH3 are remarkable. If these two proteins are supposed to form a heterodimer leading to crossover formation, then the increase of only MLH1 might be related to a different process, not leading to crossover (even not class II ones).

      We agree with the reviewer’s comment and have included this point in the discussion (lines 491- 497).

      5) Line 217: I have some doubts about the results presented in Supplementary Figure 9. First, it is not clear to me how the represented cells counts were performed. Each spot is supposed to represent cell counts in a single individual, but how many cells were counted per individual? The proportion of cells could be a better indicator. Second, some B5/B5 individuals' counts were close to the ones displayed in the wild type. Did mutant animals show a high divergence compared to each other? It could be great to have each individual data displayed in a pie chart, and not only the aggregated data.

      We have now addressed this in the new Supplemental figure 9 legend. Each dot in the graph represents the sum of cells counted for each individual. We counted cells from 8 mice for each, Topbp1+/+ and Topbp1B5/B5.

      Here we summarize the total cells counted per individual:

      Author response table 1.

      6) Line 222: The data on 53BP1 deserve further attention. On the one side, from the analysis presented in Supplementary Figure 11, it seems that 53BP1 tends to show a lower intensity in Topbp1B5/B5 mice. Since only 2 mice were analyzed, while for most of the other proteins 3-8 animals were studied, I suggest increasing the number of animals analyzed for 53BP1 localization, to test if this slight difference turns significant. This is relevant since: 1) the association of 53BP1 protein in somatic cells was clearly affected, and 2) 53BP1 is one of the last MSCI markers incorporated to the sex body at mid-late pachytene. These results should be moved to the main text and not appear as supplementary data. On the other hand, if no differences were to be found in meiosis, compared to somatic cells, how do authors explain these differences? Would 53BP1 have another partner at the sex body apart from TOPBP1? Could TOPBP1 have other BRCT domains (apart from domain 5) able to bind 53BP1?

      We appreciate the reviewer’s suggestion; however, we had an issue with 53BP1 antibody. We analyzed 2 mice and needed to re-order the antibody. This antibody was backordered for almost one year, and when we finally received the order, the company had changed the clone for this antibody, and it no longer worked for meiotic spreads. In somatic cells, we see in HEK-293T a partial disruption in the binding to TOPBP1 B5 through IP-MS and IP-Western blot. The disruption is only partial due to the binding of 53BP1 to other domains in TOPBP1 such as BRCT 1 and 2 (Bigot et al., 2019; Cescutti et al., 2010; Liu et al., 2017). However, in assays in which we would expect a phenotypic response caused by impaired 53BP1, we did not see any effect, such as survival after IR (using the mice) and survival after phleomycin challenge (using Mefs). Moreover, 53BP1 KO mice, males and females, are fertile (Ward et al., 2003) so, the partial disruption in binding to 53BP1 that we observed in TOPBP1 B5 mutant is likely not causing the infertility phenotype.

      7) Line 250: I do not understand what is represented in Figure 5A. Why did the author mix two different experiments (differences in phosphoprotein abundance in B5/B5 compared to wild type and the interference of ATR with AZ20)?

      To account for the differences in cell population observed in the whole testis between Topbp1+/+ and Topbp1B5/B5, and to know exactly which phosphorylation changes were due to disruption in the ATR signaling and not pleiotropic effects, we combined two different phosphoproteomes: One phosphoproteome from the comparison between Topbp1+/+ and Topbp1B5/B5 and another one from the comparison between Vehicle or ATR inhibitor-treated mice. By utilizing this approach, we only consider hits that were disrupted in both analyses. A similar method was used by Sims et.al, 2022 (Sims et al., 2022).

      8) It is not clearly explained what is represented in Figure 6B. There is no explanation in the text or the figure legend. Do this represent the difference between scRNAseq in control and Topbp1B5/B5? If so, please, clarify.

      We thank the reviewer’s comment and have addressed it in the legend of Figure 6B.

      9) Line 342 and following. The authors describe a decrease of gene silencing. The use of two negative concepts is always confusing and results in the conversion to a positive one. I suggest considering the possibility of just talking about increase of gene expression, in order to make the message clearer.

      We appreciate the reviewer’s point here, but it is important to note that the phenomenon disrupted in our mutants is MSCI, which is by definition a gene silencing mechanism. This phenotype is not as simple as “increased gene expression”, it is the removal of a mechanism that is a key feature of prophase I. Thus, because we are focusing on the mechanism of MSCI, it is crucial to maintain this (albeit unusual) terminology.

      10) As for the classification of spermatocytes into 9 categories, I am curious about which spermatocytes are included in each of these categories. For instance, from cytology it seems that in Topbp1B5/B5 mice, spermatocytes are able to reach mid-late pachytene. However, in the spermatocyte categories established by scRNAseq they only reach class 3. Therefore, which are the populations included in the remaining 6 classes of spermatocytes? Do authors have any morphological correlation to these scRNAseq categories? Is it possible that in this mutant morphological advance of meiosis and gene expression profiles are uncoupled?

      The clustering of cells to a specific group is based on RNA expression, which does not always match cytological features. Moreover, during the analysis, cells with high expression of mitochondrial genes are excluded (these are dying cells that do not pass the quality control). Thus, while Topbp1B5/B5 reaches a mid-late-pachytene stage according to cytological analyses, in the single-cell RNA seq analysis we could only detect one pachytene stage. The other 6 remaining categories of spermatocytes can be classified according to their best-fit profile of gene expression. For that, we use the classification described by Chen et al., 2018 and Lau et al.,2020. Spermatocytes 3-5 = Pachytene, Spermatocytes 6-7 = Diplotene, Spermatocytes 8-9 = secondary spermatocytes (metaphase I/II). The gene markers used for this classification are displayed in Author response image 2.

      Author response image 2.

      Genes used as markers of spermatocytes captured in the scRNAseq analysis. Violin plots display the distribution of cells expressing Gm960 (Leptotene marker), Meiob (Leptotene/Zygotene marker), Psma8 (Pachytene marker), Pwill1 (Pachytene marker), Pou5f2 (Diplotene marker), and Ccna1 (Secondary Spermatocytes marker).

      11) Figure 6E shows that overexpression of X-linked genes is not a feature of spermatocytes but it is initiated in spermatogonia. This fact has not been properly stated in the text and perhaps not sufficiently highlighted.

      We noticed subtle changes during the spermatogonia stage and have addressed the reviewer’s comment in lines 317-322, however the downstream analyses related to a defect in X-gene silencing maintenance displayed in Figure 7 were done based on normalization of gene expression to its respective pre-leptotene stage.

      12) Supplementary Figure 24 shows that some X-linked genes are more expressed in Topbp1B5/B5 compared to control mice. In the figure it can be observed that many genes accumulate at the bottom of the graph. Does this have any correlation to the location of these genes along the X chromosome, for instance near or within the PAR? This could correlate with the defects in γH2AX accumulation at this region.

      These are the locations along the chromosome. Only the bottom 5 rows are within the PAR region, so this accumulation is not within the PAR region specifically. The bottom tenth of the genes in the heatmap correspond to roughly a 17 Mb region.

      13) The authors only analyzed the overexpression of genes located on the X chromosome. It would be interesting to show the behavior of Y-linked genes as well.

      The coverage of Y-linked genes was not very high and that is why we have not shown the results in the paper. However, the results for Y-linked genes were similar to the X-linked genes and can be visualized in Author response image 3.

      Author response image 3.

      Single cell RNAseq reveals that Topbp1B5/B5 spermatocytes initiate MSCI but fail to promote full silencing of Y chromosome-linked genes. Violin plot displaying the ratio of the average expression of Y chromosome genes by the average expression of chromosome 9 genes at different stages of spermatogenesis for Topbp1+/+ and Topbp1B5/B5 cells.

      14) Line 425: Authors indicate that it is not known if association of TOPBP1 and BLM, 53BP1 or other proteins is disrupted in Topbp1B5/B5 spermatocytes. Could these experiments be performed in the testis, as they were in somatic cells?

      The cellular composition in Topbp1+/+ and Topbp1B5/B5 testes is very different so it would not be a fair comparison. While we have tried to isolate pachytene cells to perform these experiments, we were successful only when using Topbp1+/+ but not Topbp1B5/B5, likely due to the extremely small size of the mutant testis.

      15) Line 455 and following. I find that the discussion about the role of SETX is not completely clear. It seems that a failure of SETX function could result in defective or no transcription, as a consequence of the impossibility to resolve RNA-DNA hybrid molecules. Therefore, should impairment of SETX lead to reduced or enhanced transcription? Please clarify. On the other hand, this defect in SETX function should affect the whole genome, and not only sex chromosomes. Do authors have any clues about this broad effect?

      We thank the reviewer’s comment and have expanded on discussion in lines 470-474. While we agree with the reviewer’s point that an impairment on SETX should affect the whole genome, however, during pachytene stage, SETX is mostly localized to the sex body. The Topbp1B5/B5 shows a specific defect in X and Y silencing maintenance during pachytene stage, thus we hypothesized that an impairment in SETX localization during pachytene should especially impair the X and Y chromosomes.

      16) As a general comment to the discussion section, I think authors could extend into some specific ideas or speculations. It is shocking that sex chromosome-linked genes are able to escape silencing without dismantling the complex (almost complete) MSCI response in the Topbp1 mutant (although perhaps this is not so surprising considering the high number of escapees reported in the inactivated X chromosome in female somatic cells).

      How to explain this paradox? One possibility (which would make a real breakthrough) is that the expression of sex chromosome-linked genes represents a regulated response to meiotic defects, and not just an unfortunate consequence of a defective MSCI. Thus, MSCI might be somehow irrelevant to prevent the execution of this sex chromosome-based program to stop meiosis progression when needed. The fact that this regulated activation was never proposed is perhaps due to the fact that most of the meiosis mutants characterized so far are unable to reach the stage at which MSCI is properly established, which is the most remarkable difference with the Topbp1 mutant studied here.

      Although naïve, the critical point for the activation of this sex chromosome-based program seems to depend simply on the transcription of Zfy1 and Zfy2 (encoding for transcription factors). The signaling cascades up and downstream these genes are the real mystery, awaiting further studies.

      We thank the very interesting point raised by the reviewer. Our interpretation of the data is that X and Y silencing being a dynamic process requires an initiation step and a maintenance step driven/controlled by the DDR machinery, and that Topbp1B5/B5 shows a grossly normal initiation of X and Y silencing but fails on maintain MSCI. Moreover, the expression of Zfy1 and Zfy2 have been previously demonstrated as enough to trigger cell death (Royo et al., 2010; Vernet et al., 2016), and Topbp1B5/B5 cells show increased expression of these genes. However, we do not exclude the very interesting possibility, raised by the reviewer, that the expression of XY-linked genes represents a regulated response to meiotic defects to stop meiosis progression, leading to the cell death observed in Topbp1B5/B5, which makes the Topbp1B5/B5 an unique model for these studies as most of the previous meiosis mutants are unable to reach the stage at which MSCI is properly established. We add discussion about this exciting point in lines 513-522.

      17) Scale bars are impossible to read in Figures 1I and J, and are missing in all the other image figures. Please, correct.

      We have addressed this in the new Figure 1. For figures displaying meiotic spreads, adding a scale bar is not a common practice in the field as these cells are swollen while being prepared.

      18) Line 828. Since Paula Cohen is an author of the manuscript, it seems weird to acknowledge herself in this section.

      Corrected.

      References

      Adams SR, Maezawa S, Alavattam KG, Abe H, Sakashita A, Shroder M, Broering TJ, Sroga Rios J, Thomas MA, Lin X, Price CM, Barski A, Andreassen PR, Namekawa SH. 2018. RNF8 and SCML2 cooperate to regulate ubiquitination and H3K27 acetylation for escape gene activation on the sex chromosomes. PLoS Genet 14. doi:10.1371/journal.pgen.1007233

      Bigot N, Day M, Baldock RA, Watts FZ, Oliver AW, Pearl LH. 2019. Phosphorylation-mediated interactions with topbp1 couple 53bp1 and 9-1-1 to control the g1 DNA damage checkpoint. Elife 8:1–28.

      Cescutti R, Negrini S, Kohzaki M, Halazonetis TD. 2010. TopBP1 functions with 53BP1 in the G1 DNA damage checkpoint. EMBO J 29:3723–3732.

      Chen Y, Zheng Y, Gao Y, Lin Z, Yang S, Wang T, Wang Q, Xie N, Hua R, Liu M, Sha J, Griswold MD, Li J, Tang F, Tong M-H. 2018. Single-cell RNA-seq uncovers dynamic processes and critical regulators in mouse spermatogenesis. Cell Res 28:879–896.

      Hirota T, Blakeley P, Sangrithi MN, Mahadevaiah SK, Encheva V, Snijders AP, ElInati E, Ojarikre OA, de Rooij DG, Niakan KK, Turner JMA. 2018. SETDB1 Links the Meiotic DNA Damage Response to Sex Chromosome Silencing in Mice. Dev Cell 47:645-659.e6.

      Ichijima Y, Ichijima M, Lou Z, Nussenzweig A, Daniel Camerini-Otero R, Chen J, Andreassen PR, Namekawa SH. 2011. MDC1 directs chromosome-wide silencing of the sex chromosomes in male germ cells. Genes and Development 25:959–971.

      Lau X, Munusamy P, Ng MJ, Sangrithi M. 2020. Single-Cell RNA Sequencing of the Cynomolgus Macaque Testis Reveals Conserved Transcriptional Profiles during Mammalian Spermatogenesis. Dev Cell 54:548-566.e7.

      Liu Y, Cussiol JR, Dibitetto D, Sims JR, Twayana S, Weiss RS, Freire R, Marini F, Pellicioli A, Smolka MB. 2017. TOPBP1Dpb11 plays a conserved role in homologous recombination DNA repair through the coordinated recruitment of 53BP1Rad9. J Cell Biol 216:623–639.

      Modzelewski AJ, Holmes RJ, Hilz S, Grimson A, Cohen PE. 2012. AGO4 regulates entry into meiosis and influences silencing of sex chromosomes in the male mouse germline. Dev Cell 23:251–264. Royo H, Polikiewicz G, Mahadevaiah SK, Prosser H, Mitchell M, Bradley A, De Rooij DG, Burgoyne PS, Turner JMA. 2010. Evidence that meiotic sex chromosome inactivation is essential for male fertility. Curr Biol 20:2117–2123.

      Sims JR, Faça VM, Pereira C, Ascenção C, Comstock W, Badar J, Arroyo-Martinez GA, Freire R, Cohen PE, Weiss RS, Smolka MB. 2022. Phosphoproteomics of ATR signaling in mouse testes. Elife 11. doi:10.7554/eLife.68648

      Vernet N, Mahadevaiah SK, de Rooij DG, Burgoyne PS, Ellis PJI. 2016. Zfy genes are required for efficient meiotic sex chromosome inactivation (MSCI) in spermatocytes. Hum Mol Genet 25:5300–5310.

      Ward IM, Minn K, van Deursen J, Chen J. 2003. p53 Binding protein 53BP1 is required for DNA damage responses and tumor suppression in mice. Mol Cell Biol 23:2556–2563.

      Yeo AJ, Becherel OJ, Luff JE, Graham ME, Richard D, Lavin MF. 2015. Senataxin controls meiotic silencing through ATR activation and chromatin remodeling. Cell Discovery 1. doi:10.1038/celldisc.2015.25

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their work, and the very useful comments.

      Public reviews:

      Reviewer #2

      1) The authors discussed possible reasons for the different results of the RRP sizes between this study and Alten et al., 2021. One of them is how the hypertonic solution is applied. The authors thought that the long application of hypertonic solution in Alten et al., 2021 caused an overlapping release of RRP and upstream vesicle pools because Alten et al., 2021 measured 10-fold larger RRP size than what was measured in this study. However, Alten et al., 2021 measured RRP from IPSCs and a single inhibitory vesicle fusion causes larger charge transfer than an excitatory vesicle. The authors need to take this into consideration and 10-fold is likely an overestimate.

      Answer: Thank you for pointing out this important difference. We have modified the text in the Discussion accordingly and we no longer refer to the 10-fold difference.

      2) Statistical tests should be performed for protein expression levels (Fig 2A and Fig 10A) and in vitro fusion assays (Fig 8D,E and Fig 9 B,C).

      Answer: We inserted new panels B and C in Fig. 2 and Fig. 10 showing all the Western Blot data and performed statistical tests (none were significant). For the in vitro fusion assays, we have inserted statistical tests in panels 8E and 9C. The quantities in those panels (subdivided into “Pre Ca2+”, “post Ca2+” and “end fusion”) are based on the data in Figure 8D and 9B. We have therefore not inserted separate statistical tests in Figures 8D and 9B.

      Reviewer #1 (Recommendations For The Authors):

      It would be quite interesting for future studies to address how these three mutations in SNAP-25 behave in the Syt1 null background in their electrophysiological experiments. Does the I167N allele block the enhanced spontaneous release in the Syt1 null? Do the V48F and D1667 alleles synergize with Syt1 to enhance spontaneous release to even higher levels? By examining how different components interact to shape the energy landscape for priming and fusion, these types of approaches should be quite revealing.

      Answer: We agree with the reviewer that these future studies would be interesting. Unfortunately, they are beyond our current capacities.

      Reviewer #2 (Recommendations For The Authors):

      1) In the introduction, when discussing haploinsufficiency of Munc18-1 causes a decrease in release, additional references should be included, for example, the studies in flies (Wu et al., 1998, EMBO), human neurons (Patzke et al., 2015 JCI), and mouse neurons (Toonen et al., 2006 PNAS; Chen et al., 2020 eLife).

      Answer: Thank you for the suggestion. We have rewritten the text and added additional references.

      2) The authors may consider introducing additional motivations and significance of this study. For example, the evoked EPSCs cannot be properly measured in the cultures of Alten et al., 2021, but was properly studied here.

      Answer: We agree and have added additional motivations in the Introduction.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      We thank the reviewer for the suggestion! We evaluated PLMGraph-Inter with the predicted monomers and analyzed the result in details (see the “Impact of the monomeric structure quality on contact prediction” section and Figure 3). To mimic the real cases, we even deliberately reduced the performance of AF2 by using reduced MSAs (see the 2nd paragraph in the ““Impact of the monomeric structure quality on contact prediction” section). We leave some of the results in the supplementary of the current manuscript (Table S2). We will move these results to the main text to emphasize the performance of PLMGraph-Inter with the predicted monomers in the revision.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      We thank the reviewer for the suggestion! Yes! The performance of PLMGraph-Inter drops when the predicted monomers are used in the prediction. However, it is difficult to say which is a fairer comparison, Figure 6 or Figure S2, since AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native templates. We will provide the AFM confidence values of the AFM predictions in the revision.

      Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      We thank the reviewers for the suggestion! We would like to notify that AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native template.

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number can be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets. In the revision, we will analyze the performance of AFM on targets released after the date cutoff of the AFM training set, but with which we cannot totally remove the redundancy between the training and the test sets of AFM.

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. Our test set would have more overlaps with the training set of AFM V3, which is one reason that we think AFM V2 is more appropriate to be used in the comparison.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      We agree with the reviewer that testing whether the model can keep its performance on targets with no templates (i.e. non-redundant in structure) is important. We will perform the analysis in the revision.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

      Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model. We will further clarity this in the revision.

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding protein-protein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      We thank the reviewer for recognizing the significance of our work! We will revise the manuscript carefully to address the reviewer’s concerns.

      1. The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      We thank the reviewer for the valuable suggestion! Using different thresholds to reduce the redundancy between the test set and the training set is a very good suggestion, and we will perform the analysis in the revision. In the current version of the manuscript, the 40% sequence identity is used as the cutoff for many previous studies used this cutoff (e.g. the Recent-PDB-Multimers used in AlphaFold-Multimer (see: 7.8 Datasets in the AlphaFold-Multimer paper); the work of DSCRIPT: https://www.cell.com/action/showPdf?pii=S2405-4712%2821%2900333-1 (see: the PPI dataset paragraph in the METHODS DETAILS section of the STAR METHODS)). One reason for using the relatively higher threshold for PPI studies is that PPIs are generally not as conserved as protein monomers.

      We performed a preliminary analysis using different thresholds to remove redundancy when preparing this provisional response letter:

      Author response table 1.

      Table1. The performance of PLMGraph-Inter on the HomoPDB and HeteroPDB test sets using native structures(AlphaFold2 predicted structures).

      Method:

      To remove redundancy, we clustered 11096 sequences from the training set and test sets (HomoPDB, HeteroPDB) using MMSeq2 with different sequence identity threshold (40%, 30%, 20%, 10%) (the lowest cutoff for CD-HIT is 40%, so we switched to MMSeq2). Each sequence is then uniquely labeled by the cluster (e.g. cluster 0, cluster 1, …) to which it belongs, from which each PPI can be marked with a pair of clusters (e.g. cluster 0-cluster 1). The PPIs belonging to the same cluster pair (note: cluster n - cluster m and cluster n-cluster m were considered as the same pair) were considered as redundant. For each PPI in the test set, if the pair cluster it belongs to contains the PPI belonging to the training set, we remove that PPI from the test set.

      We will perform more detailed analyses in the revised manuscript.

      1. Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-to-head scatter plots as supplementary figures, not in the main figure.

      We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision.

      3) The authors claim that PLMGraph-Inter is complementary to AlphaFold-multimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      We thank the reviewer for the suggestion! We will add this comparison in the revision.

      4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

      We thank the reviewer for the suggestion! We will perform such analysis in the revision.

    1. Author Response

      We are grateful for the constructive comments of the reviewers. Here is a provisional response to major questions.

      To Question 1, we appreciate that you point out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig 2H-2I, in previous manuscript) whose morning anticipation still exist at some level (Fig a) though the decreases of morning anticipation index (Fig b) and advanced evening activity were not as pronounced as observed in han5304 (Fig 3C Hyun et al., 2005), our response is that the difference between pan-neuronal knockout of PDFR by unmodified Cas9 might be caused by the limited efficiency of unmodified Cas9 in our conditional system. We will adjust the relevant conclusions in the revised version, and these findings underscore the necessity to enhance the efficiency of the original Cas9

      Author response image 1.

      To Question 2, that some expression profiles of clock neurons are not consistent with previous reports, such as Dh31 and ChAT in s-LNvs, our response is that the differences can be attributed to the variation in expression patterns between 3’ terminal KI-LexA (used in this gene expression dissection) and KO-GAL4, KI-GAL4, or transgenic GAL4. We have indeed observed differences when identical sites were inserted in frame with Gal4 or LexA.

      To Question 3, that our description of advanced morning anticipation versus no morning anticipation with the term "opposite" is not accurate enough, our response is that we will modify that. Mutants of CNMa or CNMaR exhibit advanced morning activity, suggesting an inhibitory role of CNMa/CNMaR. Mutants of Pdf/Pdfr, on the other hand, showed no morning anticipation, indicating a promoting role in morning anticipation.

      To Question 4, whether we have generated transgenic UAS-sgRNA flies for all CCT genes or only a subset, our response is that we have indeed generated UAS-sgRNA flies for all CCT genes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signalproviding cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.

      Thanks!

      I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.

      I again finish by commending the authors for this terrific piece of work.

      Thanks! It is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF. We will include this discussion in the text and pursue it in our next project.

      We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all premotor neurons are NotchON neurons while all postsensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the cosubmitted paper), which in turn activates downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.

      Thanks for the positive feedback!

      Strengths:

      The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.

      Thanks!

      Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This supports their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.

      Thanks!

      Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.

      Thanks for the positive feedback on both manuscripts.

      Weaknesses:

      Differential Notch activity in L4 and L5:

      ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.

      We agree. Historically, LPCs are thought to be homogenous; our data suggests otherwise. We now emphasize this in the Discussion as requested. We are also investigating this question using single-cell RNAseq on LPCs to look for molecular heterogeneities. Nevertheless, whether L4 is generated by E(spl)mɣ-GFP+ (NotchON) LPCs does not affect our conclusion that Notch signaling and the primary HDTF Bsh are integrated to specify L4 fate over L5.

      ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.

      Dl is transiently expressed in newborn L1 neurons. To knock down Dl in newborn L1, we need to express Dl-RNAi before the onset of Dl expression in newborn L1; the only known Gal4 line expressed that early is the LPC-Gal4, which is the one that we used.

      ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.

      We agree! Whether L4 neurons are derived from NotchON LPCs is a great question. However, MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter will not work because E(spl)-mɣ-GFP reporter is only expressed in LPCs but not lamina neurons. We now mention this in the Discussion.

      ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these. differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.

      Thanks! It is a great question why Notch induces Espl-mɣ in LPCs but Hey in newborn neurons. However, it is not the question we are tackling in this paper and it will be a great direction to pursue in future. We will add this to our Discussion.

      Notch role in establishing L4 vs L5 fates:

      ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.

      We disagree that the use of 27G05-Gal4 is problematic when performing Notch-KD because our conclusion from Notch-KD is that Bsh without Notch signaling activates Pdm3 and specifies L5 fate. However, 27G05-Gal4 does not have any effect on Pdm3 expression. To make this clearer, we will quantify the percentage of Pdm3+ L5 neurons in Bsh+ lamina neurons for Notch-KD experiment. We are sorry this wasn't clearer.

      ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.

      Thank you for catching this. We will correct it in the text.

      ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?

      Our data show that Bsh with transient Notch signaling in newborn neurons specifies L4 fate while Bsh without Notch signaling in newborn neurons specifies L5 fate. Therefore, we think the window of fate competence is during newborn neurons.

      However, as suggested by the reviewer, we did the experiment (see figure below). We used Gal80 (Gal80 inhibits Gal4 activity at 18C) to temporarily control Bsh-Gal4 activity for expressing N-ICD (the active form of Notch) in L5 neurons. We found that tub-Gal80ts, Bsh-Gal4>UAS-N-ICD is unable to induce ectopic L4 neurons when we shift the temperature from 18C to 30C to inactivate Gal80 at 15 hours after pupal formation, which is close to the end of lamina neurogenesis. However, it is unknown how many hours it takes to inactivate Gal80 and activate Bsh-Gal4 and thus we decided not to include this data in our manuscript.

      Author response image 1.

      L4-to-L3 conversion in the absence of Bsh

      ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dlexpressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.

      Our data show the L4-to-L3 conversion in the absence of Bsh and in the presence of Notch activity while the L5-to-L1 conversion in the absence of Bsh and in the absence of Notch activity. Therefore, Notch activity is necessary for the L4-to-L3 conversion. Unfortunately, currently, we only have Hey as an available Notch target reporter in newborn neurons. To tackle this challenge in the future, we will profile the genome-binding targets of endogenous Notch in newborn neurons. This will identify novel genes as Notch signaling reporters in neurons for the field.

      ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.

      That is an interesting suggestion, but without knowing that Bsh + Notch = L4 identity the experiment would be hard to interpret. Note that we took advantage of Notch signaling to trace the cell fate in the absence of Bsh and found the L4-to-L3 conversion (see Figure 5G-K).

      Different chromatin landscape in L4 and L5 neurons

      ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation. (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.

      We agree and appreciate the comment, it is well justified. We have toned down our comments and clearly state that this is a correlation that needs to be tested for a causal relationship. The reviewer posits: “An alternative hypothesis: different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.” Yes, it is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF (e.g., Bsh). We will include this discussion in the text and pursue it in our next project. We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all premotor neurons are NotchON neurons while all post-sensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.

      We agree and think in L5 neurons, the secondary HDTF Pdm3 also contributes to L5-specific gene transcription during the synaptogenesis window, in addition to Bsh. We will include this in the text.

    1. Author Response

      The following is the authors’ response to the latest reviews.

      A revised version of the manuscript models "slope-based" excitability changes in addition to "threshold-based" changes. This serves to address the above concern that as constructed here changes in excitability threshold are not distinguishable from changes in input. However, it remains unclear what the model would do should only a subset of neurons receive a given, fixed input. In that case, are excitability changes sufficient to induce drift? This remains an important question that is not addressed by the paper in its current form.

      Thank you for this important point. In the simulation of two memories (Fig. S6), we stimulated half of the neural population for each of the two memories. We therefore also showed that drift happens when only a subset of neuron was simulated.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Current experimental work reveals that brain areas implicated in episodic and spatial memory have a dynamic code, in which activity r imulated networks for epresenting familiar events/locations changes over time. This paper shows that such reconfiguration is consistent with underlying changes in the excitability of cells in the population, which ties these observations to a physiological mechanism.

      Delamare et al. use a recurrent network model to consider the hypothesis that slow fluctuations in intrinsic excitability, together with spontaneous reactivations of ensembles, may cause the structure of the ensemble to change, consistent with the phenomenon of representational drift. The paper focuses on three main findings from their model: (1) fluctuations in intrinsic excitability lead to drift, (2) this drift has a temporal structure, and (3) a readout neuron can track the drift and continue to decode the memory. This paper is relevant and timely, and the work addresses questions of both a potential mechanism (fluctuations in intrinsic excitability) and purpose (time-stamping memories) of drift.

      The model used in this study consists of a pool of 50 all-to-all recurrently connected excitatory neurons with weights changing according to a Hebbian rule. All neurons receive the same input during stimulation, as well as global inhibition. The population has heterogeneous excitability, and each neuron's excitability is constant over time apart from a transient increase on a single day. The neurons are divided into ensembles of 10 neurons each, and on each day, a different ensemble receives a transient increase in the excitability of each of its neurons, with each neuron experiencing the same amplitude of increase. Each day for four days, repetitions of a binary stimulus pulse are applied to every neuron.

      The modeling choices focus in on the parameter of interest-the excitability-and other details are generally kept as straightforward as possible. That said, I wonder if certain aspects may be overly simple. The extent of the work already performed, however, does serve the intended purpose, and so I think it would be sufficient for the authors to comment on these choices rather than to take more space in this paper to actually implement these choices. What might happen were more complex modeling choices made? What is the justification for the choices that are made in the present work?

      The two specific modeling choices I question are (1) the excitability dynamics and (2) the input stimulus. The ensemble-wide synchronous and constant-amplitude excitability increase, followed by a return to baseline, seems to be a very simplified picture of the dynamics of intrinsic excitability. At the very least, justification for this simplified picture would benefit the reader, and I would be interested in the authors' speculation about how a more complex and biologically realistic dynamics model might impact the drift in their network model. Similarly, the input stimulus being binary means that, on the singleneuron level, the only type of drift that can occur is a sort of drop-in/drop-out drift; this choice excludes the possibility of a neuron maintaining significant tuning to a stimulus but changing its preferred value. How would the use of a continuous input variable influence the results.

      (1) In our model, neurons tend to compete for allocation to the memory ensemble: neurons with higher excitability tend to be preferentially allocated and neurons with lower excitability do not respond to the stimulus. Because relative, but not absolute excitability biases this competition, we suggest that the exact distribution of excitability would not impact the results qualitatively. On the other hand, the results might vary if excitability was considered dependent on the activity of the neurons as previously reported experimentally (Cai 2016, Rachid 2016, Pignatelli 2019). An increase in excitability following neural activity might induce higher correlation among ensembles on consecutive days, decreasing the drift.

      (2) We thank the reviewer for this very good point. Indeed, two recent studies (Geva 2023 , Khatib 2023) have highlighted distinct mechanisms for a drift of the mean firing rate and the tuning curve. We extended the last part of the discussion to include this point: “Finally, we intended to model drift in the firing rates, as opposed to a drift in the turning curve of the neurons. Recent studies suggest that drifts in the mean firing rate and tuning curve arise from two different mechanisms [33, 34]. Experience drives a drift in neurons turning curve while the passage of time drives a drift in neurons firing rate. In this sense, our study is consistent with these findings by providing a possible mechanism for a drift in the mean firing rates of the neurons driven a dynamical excitability. Our work suggests that drift can depend on any experience having an impact on excitability dynamics such as exercise as previously shown experimentally [9, 35] but also neurogenesis [9, 31, 36], sleep [37] or increase in dopamine level [38]”

      Result (1): Fluctuations in intrinsic excitability induce drift

      The two choices highlighted above appear to lead to representations that never recruit the neurons in the population with the lowest baseline excitability (Figure 1b: it appears that only 10 neurons ever show high firing rates) and produce networks with very strong bidirectional coupling between this subset of neurons and weak coupling elsewhere (Figure 1d). This low recruitment rate need may not necessarily be problematic, but it stands out as a point that should at least be commented on. The fact that only 10 neurons (20% of the population) are ever recruited in a representation also raises the question of what would happen if the model were scaled up to include more neurons.

      This is a very good point. To test how the model depends on the network size, we plotted the drift index against the size of the ensemble. With this current implementation, we did not observe a significant correlation between the drift rate and size of the initial ensemble (Figure S2).

      Author response image 1.

      The rate of the drift does not depend on the size of the engram. Drift rate against the size of the original engram. Each dot shows one simulation (Methods). n = 100 simulations.

      Result (2): The observed drift has a temporal structure

      The authors then demonstrate that the drift has a temporal structure (i.e., that activity is informative about the day on which it occurs), with methods inspired by Rubin et al. (2015). Rubin et al. (2015) compare single-trial activity patterns on a given session with full-session activity patterns from each session. In contrast, Delamare et al. here compare full-session patterns with baseline excitability (E = 0) patterns. This point of difference should be motivated. What does a comparison to this baseline excitability activity pattern tell us? The ordinal decoder, which decodes the session order, gives very interesting results: that an intermediate amplitude E of excitability increase maximizes this decoder's performance. This point is also discussed well by the authors. As a potential point of further exploration, the use of baseline excitability patterns in the day decoder had me wondering how the ordinal decoder would perform with these baseline patterns.

      This is a good point. Here, we aimed at dissociating the role of excitability from the one of the recurrent currents. We introduced a time decoder that compares the pattern with baseline excitability (E = 0), in order to test whether the temporal information was encoded in the ensemble i.e. in the recurrent weights. By contrast, because the neural activity is by construction biased towards excitability, a time decoder performed on the full session would work in a trivial way.

      Result (3): A readout neuron can track drift

      The authors conclude their work by connecting a readout neuron to the population with plastic weights evolving via a Hebbian rule. They show that this neuron can track the drifting ensemble by adjusting its weights. These results are shown very neatly and effectively and corroborate existing work that they cite very clearly.

      Overall, this paper is well-organized, offers a straightforward model of dynamic intrinsic excitability, and provides relevant results with appropriate interpretations. The methods could benefit from more justification of certain modeling choices, and/or an exploration (either speculative or via implementation) of what would happen with more complex choices. This modeling work paves the way for further explorations of how intrinsic excitability fluctuations influence drifting representations.

      Reviewer #2 (Public Review):

      In this computational study, Delamare et al identify slow neuronal excitability as one mechanism underlying representational drift in recurrent neuronal networks and that the drift is informative about the temporal structure of the memory and when it has been formed. The manuscript is very well written and addresses a timely as well as important topic in current neuroscience namely the mechanisms that may underlie representational drift.

      The study is based on an all-to-all recurrent neuronal network with synapses following Hebbian plasticity rules. On the first day, a cue-related representation is formed in that network and on the next 3 days it is recalled spontaneously or due to a memory-related cue. One major observation is that representational drift emerges day-by-day based on intrinsic excitability with the most excitable cells showing highest probability to replace previously active members of the assembly. By using a daydecoder, the authors state that they can infer the order at which the reactivation of cell assemblies happened but only if the excitability state was not too high. By applying a read-out neuron, the authors observed that this cell can track the drifting ensemble which is based on changes of the synaptic weights across time. The only few questions which emerged and could be addressed either theoretically or in the discussion are as follows:

      1. Would the similar results be obtained if not all-to-all recurrent connections would have been molded but more realistic connectivity profiles such as estimated for CA1 and CA3?

      This is a very interesting point. We performed further simulations to show that the results are not dependent on the exact structure of the network. In particular, we show that all-to-all connectivity is not required to observe a drift of the ensemble. We found similar results when the recurrent weights matrix was made sparse (Fig. S4a-c, Methods). Similarly to all-to-all connectivity, we found that the ensemble is informative about its temporal history (Fig. S4d) and that an output neuron can decode the ensemble continuously (Fig. S4e).

      Author response image 2.

      Sparse recurrent connectivity shows similar drifting behavior as all-to-all connectivity. The same simulation protocol as Fig. 1 was used while the recurrent weights matrix was made 50% sparse (Methods). a) Firing rates of the neurons across time. The red traces correspond to neurons belonging to the first assembly, namely that have a firing rate higher than the active threshold after the first stimulation. The black bars show the stimulation and the dashed line shows the active threshold. b) Recurrent weights matrices after each of the four stimuli show the drifting assembly. c) Correlation of the patterns of activity between the first day and every other days. d) Student's test t-value of the ordinal time decoder, for the real (blue) and shuffled (orange) data and for different amplitudes of excitability E. e) Center of mass of the distribution of the output weights (Methods) across days. c-e) Data are shown as mean ± s.e.m. for n = 10 simulations.

      1. How does the number of excited cells that could potentially contribute to an engram influence the representational drift and the decoding quality?

      This is indeed a very good question. We did not observe a significant correlation between the drift rate and size of the initial ensemble (Fig. S2).

      Author response image 3.

      The rate of the drift does not depend on the size of the engram. Drift rate against the size of the original engram. Each dot shows one simulation (Methods). n = 100 simulations.

      1. How does the rate of the drift influence the quality of readout from the readout-out neuron?

      We thank the reviewer for this interesting question. We introduced a measure of the “read-out quality” and plotted this value against the rate of the drift. We found a small correlation between the two quantities. Indeed, the read-out quality decreases with the rate of the drift.

      Author response image 4.

      The quality of the read-out decreases with the rate of the drift. Read-out quality computed on the firing rate of the output neuron against the rate of the drift (Methods). Each dot shows one simulation. n = 100 simulations.

      Reviewer #3 (Public Review):

      The authors explore an important question concerning the underlying mechanism of representational drift, which despite intense recent interest remains obscure. The paper explores the intriguing hypothesis that drift may reflect changes in the intrinsic excitability of neurons. The authors set out to provide theoretical insight into this potential mechanism.

      They construct a rate model with all-to-all recurrent connectivity, in which recurrent synapses are governed by a standard Hebbian plasticity rule. This network receives a global input, constant across all neurons, which can be varied with time. Each neuron also is driven by an "intrinsic excitability" bias term, which does vary across cells. The authors study how activity in the network evolves as this intrinsic excitability term is changed.

      They find that after initial stimulation of the network, those neurons where the excitability term is set high become more strongly connected and are in turn more responsive to the input. Each day the subset of neurons with high intrinsic excitability is changed, and the network's recurrent synaptic connectivity and responsiveness gradually shift, such that the new high intrinsic excitability subset becomes both more strongly activated by the global input and also more strongly recurrently connected. These changes result in drift, reflected by a gradual decrease across time in the correlation of the neuronal population vector response to the stimulus.

      The authors are able to build a classifier that decodes the "day" (i.e. which subset of neurons had high intrinsic excitability) with perfect accuracy. This is despite the fact that the excitability bias during decoding is set to 0 for all neurons, and so the decoder is really detecting those neurons with strong recurrent connectivity, and in turn strong responses to the input. The authors show that it is also possible to decode the order in which different subsets of neurons were given high intrinsic excitability on previous "days". This second result depends on the extent by which intrinsic excitability was increased: if the increase in intrinsic excitability was either too high or too low, it was not possible to read out any information about past ordering of excitability changes.

      Finally, using another Hebbian learning rule, the authors show that an output neuron, whose activity is a weighted sum of the activity of all neurons in the network, is able to read out the activity of the network. What this means specifically, is that although the set of neurons most active in the network changes, the output neuron always maintains a higher firing rate than a neuron with randomly shuffled synaptic weights, because the output neuron continuously updates its weights to sample from the highly active population at any given moment. Thus, the output neuron can readout a stable memory despite drift.

      Strengths:

      The authors are clear in their description of the network they construct and in their results. They convincingly show that when they change their "intrinsic excitability term", upon stimulation, the Hebbian synapses in their network gradually evolve, and the combined synaptic connectivity and altered excitability result in drifting patterns of activity in response to an unchanging input (Fig. 1, Fig. 2a). Furthermore, their classification analyses (Fig. 2) show that information is preserved in the network, and their readout neuron successfully tracks the active cells (Fig. 3). Finally, the observation that only a specific range of excitability bias values permits decoding of the temporal structure of the history of intrinsic excitability (Fig. 2f and Figure S1) is interesting, and as the authors point out, not trivial.

      Weaknesses:

      1. The way the network is constructed, there is no formal difference between what the authors call "input", Δ(t), and what they call "intrinsic excitability" Ɛ_i(t) (see Equation 3). These are two separate terms that are summed (Eq. 3) to define the rate dynamics of the network. The authors could have switched the names of these terms: Δ(t) could have been considered a global "intrinsic excitability term" that varied with time and Ɛ_i(t) could have been the external input received by each neuron i in the network. In that case, the paper would have considered the consequence of "slow fluctuations of external input" rather than "slow fluctuations of intrinsic excitability", but the results would have been the same. The difference is therefore semantic. The consequence is that this paper is not necessarily about "intrinsic excitability", rather it considers how a Hebbian network responds to changes in excitatory drive, regardless of whether those drives are labeled "input" or "intrinsic excitability".

      This is a very good point. We performed further simulations to model “slope-based”, instead of “threshold-based”, changes in excitability (Fig. S5a, Methods). In this new definition of excitability, we changed the slope of the activation function, which is initially sampled from a random distribution. By introducing a varying excitability, we found very similar results than when excitability was varied as the threshold of the activation function (Fig. S5b-d). We also found similarly that the ensemble is informative about its temporal history (Fig. S5e) and that an output neuron can decode the ensemble continuously (Fig. S5f).

      Author response image 5.

      Change of excitability as a variable slope of the input-output function shows similar drifting behavior as considering a change in the threshold. The same simulation protocol as Fig. 1 was used while the excitability changes were modeled as a change in the activation function slope (Methods). a) Schema showing two different ways of defining excitability, as a threshold (top) or slope (bottom) of the activation function. Each line shows one neuron and darker lines correspond to neurons with increased excitability. b) Firing rates of the neurons across time. The red traces correspond to neurons belonging to the first assembly, namely that have a firing rate higher than the active threshold after the first stimulation. The black bars show the stimulation and the dashed line shows the active threshold. c) Recurrent weights matrices after each of the four stimuli show the drifting assembly. d) Correlation of the patterns of activity between the first day and every other days. e) Student's test t-value of the ordinal time decoder, for the real (blue) and shuffled (orange) data and for different amplitudes of excitability E. f) Center of mass of the distribution of the output weights (Methods) across days. d-f) Data are shown as mean ± s.e.m. for n = 10 simulations.

      1. Given how the learning rule that defines input to the readout neuron is constructed, it is trivial that this unit responds to the most active neurons in the network, more so than a neuron assigned random weights. What would happen if the network included more than one "memory"? Would it be possible to construct a readout neuron that could classify two distinct patterns? Along these lines, what if there were multiple, distinct stimuli used to drive this network, rather than the global input the authors employ here? Does the system, as constructed, have the capacity to provide two distinct patterns of activity in response to two distinct inputs?

      This is an interesting point. In order to model multiple memories, we introduced non-uniform feedforward inputs, defining different “contexts” (Methods). We adapted our model so that two contexts target two random sub-populations in the network. We also introduced a second output neuron to decode the second memory. The simulation protocol was adapted so that each of the two contexts are stimulated every day (Fig. S6a). We found that the network is able to store two ensembles that drift independently (Fig. S6 and S7a). We were also able to decode temporal information from the patterns of activity of both ensembles (Fig. S7b). Finally, both memories could be decoded independently using two output neurons (Fig. S7c and d).

      Author response image 6.

      Two distinct ensembles can be encoded and drift independently. a) and b) Firing rates of the neurons across time. The red traces in panel b) correspond to neurons belonging to the first assembly and the green traces to the second assembly on the first day. They correspond to neurons having a firing rate higher than the active threshold after the first stimulation of each assembly. The black bars show the stimulation and the dashed line shows the active threshold. c) Recurrent weights matrices after each of the eight stimuli showing the drifting of the first (top) and second (bottom) assembly.

      Author response image 7.

      The two ensembles are informative about their temporal history and can be decoded using two output neurons. a) Correlation of the patterns of activity between the first day and every other days, for the first assembly (red) and the second assembly (green). b) Student's test t-value of the ordinal time decoder, for the first (red, left) and second ensemble (green, right) for different amplitudes of excitability E. Shuffled data are shown in orange. c) Center of mass of the distribution of the output weights (Methods) across days for the first (w?ut , red) and second (W20L't , green) ensemble. a-c) Data are shown as mean ± s.e.m. for n = 10 simulations. d) Output neurons firing rate across time for the first ensemble (Yl, top) and the second ensemble (h, bottom). The red and green traces correspond to the real output. The dark blue, light blue and yellow traces correspond to the cases where the output weights were randomly shuffled for every time points after presentation of the first, second and third stimulus, respectively.

      Impact:

      Defining the potential role of changes in intrinsic excitability in drift is fundamental. Thus, this paper represents a potentially important contribution. Unfortunately, given the way the network employed here is constructed, it is difficult to tease apart the specific contribution of changing excitability from changing input. This limits the interpretability and applicability of the results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weinberger et al. use different fate-mapping models, the FIRE model and PLX-diet to follow and target different macrophage populations and combine them with single-cell data to understand their contribution to heart regeneration after I/R injury. This question has already been addressed by other groups in the field using different models. However, the major strength of this manuscript is the usage of the FIRE mouse model that, for the first time, allows specific targeting of only fetal-derived macrophages. The data show that the absence of resident macrophages is not influencing infarct size but instead is altering the immune cell crosstalk in response to injury, which is in line with the current idea in the field that macrophages of different origins have distinct functions in tissues, especially after an injury. To fully support the claims of the study, specific targeting of monocyte-derived macrophages or the inhibition of their influx at different stages after injury would be of high interest. In summary, the study is well done and important for the field of cardiac injury. But it also provides a novel model (FIRE mice + RANK-Cre fate-mapping) for other tissues to study the function of fetal-derived macrophages while monocyte-derived macrophages remain intact.

      Response from the authors: We thank the reviewer for the thorough review and the positive feedback, and we agree that the Csf1r-FIRE mice represent an interesting model for studying the role of resident embryo-derived macrophages in different tissues and pathologies.

      Recent work of the Cochain lab demonstrated by combined CITE-seq analysis and CCR2 antibody treatment that monocyte depletion does not affect levels of resident tissue macrophages after myocardial infarction (REF Rizzo et al PMID: 35950218), supporting the concept to specifically investigate the role of resident and recruited macrophages. While previous work has addressed the effects of broad CCR2-mediated monocyte depletion, information on differential macrophage subsets derived from blood monocytes has been lacking. We agree with the reviewer that targeting subsets of monocyte-derived macrophages, such as for example Ly6Chi monocytes, MHCII+Il1b+ macrophages, and Isg15hi populations (REF Rizzo et al PMID: 35950218), or interference with their recruitment at different time-points after myocardial infarction would be of interest and could help to decipher their functions in the different stages of cardiac healing. However, these studies would go beyond the scope of the current analysis and will be addressed in a separate project.

      Reviewer #2 (Public Review):

      In this study Weinberger et al. investigated cardiac macrophage subsets after ischemia/reperfusion (I/R) injury in mice. The authors studied a ∆FIRE mouse model (deletion of a regulatory element in the Csf1r locus), in which only tissue resident macrophages might be ablated. The authors showed a reduction of resident macrophages in ∆FIRE mice and characterized its macrophages populations via scRNAseq at baseline conditions and after I/R injury. 2 days after I/R protocol ∆FIRE mice showed an enhanced pro inflammatory phenotype in the RNAseq data and differential effects on echocardiographic function 6 and 30 days after I/R injury. Via flow cytometry and histology the authors confirmed existing evidence of increased bone marrow-derived macrophage infiltration to the heart, specifically to the ischemic myocardium. Macrophage population in ∆FIRE mice after I/R injury were only changed in the remote zone. Further RNAseq data on resident or recruited macrophages showed transcriptional differences between both cell types in terms of homeostasis-related genes and inflammation. Depleting all macrophage using a Csf1r inhibitor resulted in a reduced cardiac function and increased fibrosis.

      Strengths

      1) The authors utilized robust methodology encompassing state of the art immunological methods, different genetic mouse models and transcriptomics.

      2) The topic of this work is important given the emerging role of tissue resident macrophages in cardiac homeostasis and disease.

      Response from the authors: We thank the reviewer for pointing out the strengths of our study, and putting the findings in context of the current view of the role of resident macrophages.

      Weaknesses:

      1) Specificity of ∆FIRE mouse model for ablating resident macrophages.

      The study builds on the assumption that only resident macrophages are ablated in ∆FIRE mice, while bone marrow-derived macrophages are unaffected. While the effects of the ∆FIRE model is nicely shown for resident macrophages, the authors did not directly assess bone marrow-derived macrophages. Moreover, in the immunohistological images in Fig. 1D nearly all macrophages appear to be absent. It would be helpful to further address the question of whether recruited macrophages are influenced in ∆FIRE mice. Evaluation of YFP positive heart and blood cells in ∆FIRE mice crossed with Flt3CreRosa26eYFP mice could clarify whether bone marrow-derived cardiac macrophages are influenced in ∆FIRE mice. This would be even more relevant in the I/R model where recruitment of bone marrow-derived macrophages is increased. A more direct assessment of recruited macrophages in ∆FIRE mice could also help to discuss potential similarities or discrepancies to the study of Bajpai et al, Circ Res 2018, which showed distinct effects of resident versus recruited macrophages after myocardial infarction. Providing the quantification of flow cytometry data (fig. 1E-F) would be supportive.

      Response from the authors: We thank the reviewer for these comments. The reviewer addresses the specificity of the ∆FIRE mouse model for ablating resident macrophages and its potential effects on bone marrow-derived macrophages. Our single-cell sequencing data support the specificity of the ∆FIRE model regarding embryo-derived resident macrophages in two ways. First, the ∆FIRE mice are characterized by the specific reduction of embryo-derived macrophage clusters (e.g. homeostatic macrophages as well as antigen-presenting macrophages) in baseline conditions, while the abundance of recruited macrophages (e.g. Ccr2hiLy6chi macrophages, Cx3Cr1hi macrophages) is not altered (Fig. 2B-D). Second, transcriptomic analysis of bone marrow-derived macrophage clusters (e.g. Ccr2hiLy6chi macrophages, Cx3Cr1hi macrophages) and of monocytes revealed no differences in ∆FIRE compared to control mice. On the other hand, we found substantial transcriptome differences in clusters that were mainly of embryonic origins (e.g. homeostatic macrophages as well as antigenpresenting macrophages) (Fig.2 and Fig S.4). These findings indicate that the ∆FIRE model mainly induces changes in embryo-derived macrophages.

      We agree with this reviewer that crossbreeding of ∆FIRE mice with Flt3CreRosa26eYFP mice would be of interest, and we have been working hard to establish this line. However, our breeding efforts have thus far been in vain, which is probably due to the necessity to keep a CBA/Ca background for the FIRE model (as reported by JAX: https://www.jax.org/strain/032783) and requires further backcrossing of Flt3CreRosa26eYFP mice with the respective CBA strain. In future work, we plan to carry out this experiment and also to specifically target monocyte-derived macrophages.

      The reviewer further asks about the modality to quantify cardiac macrophages, and suggests flow cytometry to quantify their number and not only use immunohistology. The quantification of cardiac immune cells shown in Fig. 1D (formerly 1C) was in fact performed by flow cytometry. We apologize for the lack of clarity. We rearranged the figure and added this information to the figure legend. We also added quantification by immunohistology, which is now shown in Fig. 1G.

      2) Limited adverse cardiac remodeling in ∆FIRE mice after I/R.

      The authors suggested an adverse cardiac remodeling in ∆FIRE mice. However, the relevance of a <5% reduction in ejection fraction/stroke volume within an overall normal range in ∆FIRE mice is questionable. Moreover, 6 days after I/R injury ∆FIRE mice were protected from the impairment in ejection fraction and had a smaller viability defect. Based on the data few questions may arise: Why was ablation of resident macrophages beneficial at earlier time points? Are recruited macrophages affected in ∆FIRE mice (see above)? Overall, the manuscript could benefit if the claim of an adverse remodeling in ∆FIRE mice would be discussed more carefully.

      Underlying mechanisms:

      The study did not functionally evaluated targets from transcriptomics to provide further mechanistic insights. It would be helpful if the authors discuss potential mechanisms of the differential effects of macrophages after ischemia in more detail.

      Response from the authors: The reviewer raises the question why the ablation of resident macrophages trends towards a beneficial effect at earlier time points after I/R injury. Further, the reviewer questions the relevance of a <5% reduction in ejection fraction/stroke volume over time in the light of an otherwise modestly reduced ejection fraction.

      In this study we used the experimental mouse model of ischemia-reperfusion injury with transient (1h) coronary artery occlusion. The potential disadvantage of this model is the smaller infarct size and smaller effects on cardiac function. However, it better represents the clinical picture and pathology of myocardial infarction in human patients with timely reperfusion by percutaneous coronary intervention. Infarct size after I/R was approx. 25% in control animals indicating relevant cardiac injury. Further, infarct size was reduced to approx. 16% in ∆FIRE mice 6 days after infarction, however, the difference did reach statistical significance. In line with this, the ejection fraction was numerically reduced on d6 after infarction in the control group, however with no statistical significance. In the chronic phase after infarction, the ejection fraction improved over time in the control group by approx. 5% and decreased in ∆FIRE mice by 4%, which resulted in a difference (delta) of 9% change of ejection fraction. This indicated adverse remodeling in ∆FIRE mice.

      We agree that the different impact of the absence of resident cardiac macrophages during the course of myocardial healing after injury is of great interest to the field. We discuss potential mechanisms of the differential effects of resident macrophage ablation in lines 290-314 in the revised manuscript. However, to decipher the influence of embryo-derived macrophages at different time points after infarction, an inducible model for specific depletion of this macrophage population would be necessary, which to our knowledge does not exist.

      In the revised manuscript, we now discuss the effects on cardiac healing in ∆FIRE and also the limitations more thoroughly.

      Other:

      • It is unclear why the authors performed RNAseq experiments 2 days after I/R (fig. 5/6), while the proposed functional phenotype occurred later. - A sample size of 2 animals per group appears very limited for RNAseq in ∆FIRE mice (fig.6).

      Response from the authors: We chose a time point in the “late early phase” of myocardial infarction (= day 2 post I/R) as we were also interested in the effect of resident macrophage depletion on other immune cell subsets (e.g. neutrophils) which could only be captured in this time period.

      We aimed to analyse 10000 cells per condition. The applied sample size allowed us to analyse 13452 CD45+cells from ∆FIRE mice and 9152 cells from control mice in infarct condition.

      Lines 299-324 "Ablation of resident macrophages altered macrophage crosstalk to non-macrophage immune cells, especially lymphocytes and neutrophils. This was characterized by a proinflammatory gene signature, such as neutrophil expression of inflammasome-related genes and a reduction in anti-inflammatory genes like Chil3 and Lcn2. Interestingly, inflammatory polarization of neutrophils have also been associated with poor outcome after ischemic brain injury (Cuartero et al, 2013). Clinical trials in myocardial infarction patients showed a correlation of inflammatory markers with the extent of myocardial damage {Sanchez, 2006 #2763} and with short- and long-term mortality {Mueller, 2002 #2780}.

      Our study provides evidence that the absence of resident macrophages negatively influences cardiac remodeling in the late postinfarction phase in ∆FIRE mice indicating their biological role in myocardial healing. In the early phase after I/R injury, absence of resident macrophages had no significant effect on infarct size or LV function. These observations potentially indicate a protective role in the chronic phase after myocardial infarction by modulating the inflammatory response, including adjacent immune cells like neutrophils or lymphocytes.

      Deciphering in detail the specific functions of resident macrophages is of considerable interest but requires both cell-specific and temporally-controlled depletion of respective immune cells in injury, which to our knowledge is not available at present. These experiments could be important to tailor immune-targeted treatments of myocardial inflammation and postinfarct remodelling."

      Reviewer #1 (Recommendations For The Authors):

      1) Fetal-derived macrophages are often involved in organ development and function during steady-state. The authors should show heart morphology/function before I/R injury to make sure that the cause for a worsened outcome in FIRE mice is not due to a developmental/functional defect.

      Response from the author: We conducted a gross analysis of cardiac morphology by histology, and did not determine differences to littermate controls. However, we have not conducted a detailed investigation of cardiac development since this was not the scope of this study. Further, our study mainly shows differences in cardiac healing between d6 and d30, which is unlikely influenced by developmental defects.

      2) Line 164: The authors state that they have analysed macrophages via flow cytometry, but Figure 4a only shows IF. Quantification of different macrophage subsets via flow cytometry should be included in this model.

      Response from the author: The sentence “To gain a deeper understanding of the inflammatory processes taking place in the infarcted heart, we quantified macrophage distribution by immunofluorescence and flow cytometry analysis of ischemic and remote areas after I/R.” beginning line 164 describes the entire figure 4 and not only 4a. Here we show IF as well as flow cytometry to describe numbers but also different subpopulations of macrophages (BM-derived vs. resident).

      3) Lines 254-255 (now starting 267): it is not entirely true that the heart does not harbor BM-derived macrophages under steady state. Of course, there are many more after I/R injury, but the authors should take also their own data into account (Figure 1c, e showing a clear reduction but not complete absence of macrophages) and not claim a "scarce" population. See also Dick et al (PMID: 30538339), where both, the Ccr2-Tim4- and Ccr2+ populations are (slowly) replaced by BM monocytes.

      Response from the author: We thank the reviewer for this comment. We changed “scarce population” to “small population”.

      4) Lines 269-273 (now starting line 283): The point that DT-mediated depletion of cells causes inflammation that may have an impact on macrophages is compelling. However, the approach of combining and correlating data from PLX diet and FIRE mice is not proof that the significant increase in infarct size and deterioration of left ventricular function after I/R injury is driven by monocyte-derived macrophages. The authors could use Ccr2KO mice or injection of Ly6C antibody to show the specific functions of recruited macrophages.

      Response from the author: In this study we combine a specific genetic depletion of resident macrophages (FIRE) with an pharmaceutical depletion of all macrophage populations (Csf1r-inhibiton with PLX5622). We did not aim to specifically deplete monocyte-derived macrophages, which has been addressed previously by Bajpai et al. (PMID: 30582448) using the CCR2-DTR mouse line. To address the functions of recruited macrophages would go beyond the scope of the manuscript.

      Along these lines: the authors discuss that neutrophils may have been targeted in the Ccr2-DTR model. However, the egress of neutrophils in the CCR2 KO model is not affected and should be a good model to look at the impact of monocyte-derived macrophages after I/R injury in the heart.

      Response from the author: We agree with the reviewer that CCR2 under steady state conditions might not be important for the egress of neutrophils. However, after ischemic injury CCR2-inhibition has been shown to impair neutrophil egress as well as neutrophil recruitment to ischemic tissue in an ischemia-reperfusion injury model (PMID: 28670376).

      5) Line 299 (now line 332): Reference is missing for Ccr2-DTR mice study

      Response from the author: We added the respective reference.

      6) Can the authors take also the timing of treatment/cell depletion into account in their discussion incoming monocytes may be required in the first days after injury to promote the regeneration process so that targeting them before the onset of the injury may be detrimental while targeting them during the chronic phase may be beneficial.

      Response from the author: We thank the reviewer for this comment. We added the following sentence to the manuscript (Lines 343-346):

      “An explanation of this controversy might be the timing and duration of macrophage depletion. Bajpai et al. depleted recruited macrophages only in the initial phase of myocardial infarction which improved cardiac healing (Bajpai et al., 2019), while depletion of macrophages over a longer period of time, as shown in our study, is detrimental for cardiac repair.”

      7) Figure 6E, F: Why are the outgoing signals pooled? The data has the strength of distinguishing between distinct populations. This data should be used and exploited to work out distinct pathways of distinct macrophage populations in more detail. From the representation, it remains unclear which pathways are active and distinct between Ctrl and FIRE mice besides the few chosen once (inflammasome). Also, legends are missing (what is red/blue?)

      Response from the author: We thank the reviewer for this comment. The aim of this analysis was to evaluate the effect of the FIRE ko on communication of immune cells in infarct conditions. To address changes in all populations which are affected by the FIRE ko we pooled the respective clusters (e.g. homeostatic, antigen-presenting and Ccr2loLy6clo Mø clusters). We provided the detailed analysis of the individual clusters in the new Supplemental Figure 9. Further, we added the respective legend to the Figure.

      8) The methods part mentioned CD169-DTR mice, however, there are no experiments shown in the manuscript. Further, how did the authors breed the FIRE mice? It is known in the field that they have big developmental issues and behavioural deficits if kept on a B6 background, which was likely the case in the study, at least for the fate-mapping approach.

      Response from the author: We removed the CD169-DTR reference from the methods part.<br /> FIRE mice were kept on a CBA/Ca background. As mentioned by the reviewer this was not the case for the experiment where reporter mice were bred with FIRE mice (Csf1rΔFIRE/+RankCreRosa26eYFP) as these mice are on a C57Bl6 background. All experiments evaluating cardiac function and outcome after infarction in FIRE mice were performed on mice kept with a CBA/Ca background.

      Reviewer #2 (Recommendations For The Authors):

      • Please provide the sample size for Fig. 5.

      We described the sample size in the methods part (lines 448-450: “Cell sorting was performed on a MoFlo Astrios (Beckman Coulter) to obtain cardiac macrophages from CD45.2; Mx1CreMybflox/flox after BM-transplantation of CD45.1 BM (n=3 for 2 days after I/R injury) for bulk sequencing,..“). We added the sample size also to the figure legend.

      • Please state in the methods how the normality of data was tested.

      We added the respective normality test to the methods part. “The Shapiro-Wilk test was used to test normality. “

      • How did the authors ensure a standardized infarct size?

      The authors ensured a standardized infarct size in mice following myocardial infarction through a carefully controlled experimental protocol. We employed the well-established I/R procedure for inducing myocardial infarction in mice by ligation of the LAD for 1h to mimic the transient blockage of blood flow to the anterior wall of the heart. Success of the ligation of the LAD and the induction of ischemia was confirmed by the pale color of the myocardium after ligation and the success of reperfusion by the return of color after removing the suture. The surgical technique was consistently performed by the same highly trained veterinarian in a blinded fashion to minimize variability.

    1. Author Response

      The following is the authors’ response to the original reviews.

      To the reviewers.

      We appreciate a detailed and deep review of our manuscript. Below are our comments and responses. Many requested data are present in the Supplementary figures of the manuscript. There seem to be two main concerns: one regarding the evidence of TLT2 expression in HFSCs; and second, regarding CEP/TLR2. As detailed below, we utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. The source (the data are in Supplementary Fig. 5A, B and in references below) and nature of CEP (it is not a protein, but metabolic product of Polyunsaturated acid DHA oxidation by MPO amongst other ROS sources) are also explained below.

      1) “The expression analysis of TLR2 is questionable. Many of the conclusions about the level of target genes are based on quantifying fluorescence intensity in microscopy images (e.g., TLR2 level in young or aged mice, BMP7 levels in mice with/without TLR2 KO). This could be strengthened by using qPCR to measure gene expression levels in FACS-sorted HFSCs, which would provide more accurate quantification. Additionally, the authors should test if the TLR2 antibody used is valid.”

      In most instances we have used TLR2 reporter mouse, which presents an advantage over immunostaining. Fig.2 (A-H) shows expression of TLR2 reporter, not the staining with TLR2 abs. For selected experiments we utilized immunostaining with anti- TLR2 (Santa Cruz Biotechnology, sc-21759) antibody, which has been validated in our previous publication (see Michael G. McCoy and all. Endothelial TLR2 promotes proangiogenic immune cell recruitment and tumor angiogenesis. // Sci Signal. 2021 Jan 19; 14(666): eabc5371/doi: 10.1126/ scisignal.abc5371). In Fig.S2E of that manuscript we validated these abs using a knockout of TLR2. In the current paper, we further validate anti-TLR2 abs by showing its co-localization with the TLR2-GFP reporter (Fig. S1A).

      We then confirmed reporter and immunostaining data by qPCR showing Tlr2 expression in FACS-purified mouse HFSCs in anagen, telogen, and catagen (Fig.2J), in mouse epidermal cells and FACS-purified HFSCs (Fig.2K), and FACS-purified HFSCs isolated from Control and TLR2HFSC-KO mice (Fig.4E).

      As for the mechanistic link between TLR2 and BMP signaling was identified using RNAseq on FACS-purified HFSCs (supplementary Fig.4), then verified using qPCR (Fig.4E shows Bmp7,Bmp2, Bmpr1a ) and only then immunohistochemistry staining for BMP7 and phosphoSMAD1/5/9 was used (Fig.4A-D, F-H). Note that the large body of requested evidence is presented in Supplementary data. Other mechanistic links shown using qPCR include Nfkb2, Il1b, Il6, and Bmp7 in FACS-purified mouse HFSCs treated with BSA control or CEP (Fig.6Q,6R).

      “As the reviewers note, it is not clear whether the TLR2+ signal is located at the basal side of bulge stem cells, basement membrane underlying bulge stem cells, or dermal sheath cells encapsulating bulge structure. Co-staining with basement membrane markers such as collagen and laminin or HFSC basal side membrane markers such as Itga6, Itgb1, and Itgb4 will clarify this. In addition, showing the expression pattern of TLR2 in full skin including epidermis and dermis would be helpful. As TLR2 is highly expressed in immune cells or blood endothelial cells, if the antibody staining is valid, strong positive signals should present in the cells. Moreover, testing the TLR2 antibody in Tlr2 knock-out mouse tissues would be an appropriate control experiment.”

      Once again, in most instances we have used not the staining for TLR2 but TLP2 reporter mouse (Fig.2 legend). Anti-TLR2 abs have been verified in TLR2 KO as described above. Fig.2K shows comparison of Tlr2 mRNA expression in mouse epidermal cells to FACS-purified HFSCs by qPCR.

      TLR2 signal is detected in several cell types within the hair follicle as well as in dermal cells surrounding the hair follicles, such as lymphocytes, resident tissue macrophages, fibroblast, and fibroblast precursors, etc. (https://www.proteinatlas.org/ENSG00000137462-TLR2/single+cell+type). In Author response image 1 below, white arrows point to the TLR2-positive cells around the hair follicle. In our paper, we focus on HFSC TLR2 and use the respective inducible tissue specific TLR2 KO. The contribution of TLR2 on other cell types can be assessed by the comparison of the phenotypes of global TLR2 KO, TLR2 KO-WT bone marrow chimeras and HFSC-specific TLR2 KO. The results are presented in both, main and supplementary figures (Fig.5D-I and SFig.5I-K shows global TLR2 KO, Fig.6H-I, SFig.5G-h shows bone marrow chimeras and Figs.3,4, 5 (J-M), Fig.5 (J-N) shows the main focus, HFSC-TLR2 KO. Overall, the phenotype (delay of hair regeneration after wounding) seems to be the strongest in TLR2 KO, whereas bone marrow chimeras and HFSCs phenotypes are comparable. Thus, TLR2 on bone marrow derived cells complements the main role for TLR2 on HFSCs.

      Author response image 1.

      Staining for TRLR2 (white), DAPI (blue) and Keratin 17 (purple) is shown

      “The increase in expression of TLR2 during the hair follicle stem cell activation should be documented by FACS and/or qPCR. This is important because as noted by one of the reviewers.”

      While original observation was done using both, a TLR2 reporter mouse and immunostaining, the data were confirmed by qPCR showing Tlr2 mRNA expression in FACS-purified mouse HFSCs in anagen, telogen, and catagen (Fig.2J).

      “In Fig 1D, the authors mentioned that they re-analyzed published RNA-seq data (Greco et al., 2009) to show the increase of Tlr2 and Tlr6 expression in late telogen compared to early telogen. However, there is no RNA-seq data in that paper, but only microarray data of bulge vs HG comparison and dermal papillae cells (DP) in early, mid, late Telo. If the authors used DP data to show the increase of Tlr2 transcripts in late Telo, the analysis is completely wrong and has to be corrected. The problem is compounded by the fact that in other published HFSC RNA-seq datasets (Yang et al., Cell, 2017, Adam et al., Nature Cell Biology, 2020), the expression levels of Tlr2 and Tlr6 are very low (below 5 TPM). In Fig 1G, the authors also re-analyzed Morinaga et al., 2021 data to show the reduction of Tlr2 expression in HFSCs in high-fat diet mice. However, in the raw data of Morinaga et al., 2021 (GSE169173), Tlr2 expression FPKM values are below 1 in both normal diet and high-fat diet samples, which are too low to perform comparative analysis and are not statistically meaningful. Like Tlr2, the expressions of Tlr1 and Tlr6, which form heterodimer with TLR2, are almost 0. Thus, the authors should revisit the dataset and revise their analysis and conclusion.”

      To document the existence of Tlr2 and Tlr6 expression in HFSCs, the authors should perform RNR-seq-based gene expression analysis by themselves. Otherwise, the authors' TLR2 expression analyses in Fig 1 are not convincing. These are serious issues that the authors will want to rectify so that eLIFE readers will not discount their findings and importance.”

      It is correct, we analyzed a published array, not RNAseq data (Greco et al., 2009) using GEO2R tool which allowed us to compare the mRNA expression levels between early, middle, and late telogen in bulge CD34 positive cells. We changed the “RNA-seq” (the term was used incorrectly) to “RNA microarray” in the main text.

      In our manuscript, TLR2 expression is documented not only in Fig.1, but also in Fig.2 and S.Fig.1. We utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. Fig.2K shows comparison of Tlr2 mRNA expression in mouse epidermal cells to FACS-purified HFSCs by qPCR to document increased TLR2 expression on HFSCs. Likewise, Fig.2J shows qPCR for TLR2 on HFSC during various phases of hair growth.

      “In Fig 2, to support the expression of Tlr2 in HFSCs, the authors utilized TLR2-GFP mice and showed the strong GFP expression in HFSCs, hair bulb, and ORS. However, as the expression data in Fig 1 are questionable, the GFP reporter data should be carefully analyzed with proper control experiments. For example, although TLRs are highly expressed in immune cells and endothelial cells, which are abundantly present in skin, Fig 2 data did show the GFP expression in these cells. Instead, the GFP signals looked very specific to epithelial compartments, which is odd. Again, to convince readers, the authors should provide more comprehensive analyses of expression patterns of TLR2-GFP mice in skin. Also, if the TLR2-GFP signals faithfully reflect the actual expression of Tlr2 mRNA, the GFP signals should increase in late telogen compared to early telogen. The authors should check whether TLR2-GFP expression follows this pattern.”

      The specificity of TLR reporter was characterized in Price et al. , 2018. A Map of Toll-like Receptor Expression in the Intestinal Epithelium Reveals Distinct Spatial, Cell Type-Specific, and Temporal Patterns. Immunity, 49. Thus, TLR2 reporter mouse is well characterized (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6152941/) and represents one of the best available tools to show TLR2 expression.

      Expression of TLR2 on endothelial cells and validation of anti-TLR2 abs was performed in McCoy et al, Science Signaling as mentioned above. Also as discussed above we show a strong correlation between TLR2-GFP reporter expression and TLR2 expression using coimmunostaining with GFP and TLR2 antibodies with appropriate isotype-match non-immune antibodies as negative controls.

      There is no doubt that TLR2 is expressed on immune, endothelial and epithelial cells. According to the Human Protein Atlas, TLR2 expression is identified in skin fibroblasts, keratinocytes, melanocytes, etc., so our findings are well supported by the literature (https://www.proteinatlas.org/ENSG00000137462-TLR2/single+cell+type). Indeed, we detected TLR2 in cells surrounding the hair follicle (see the pictures above). TLR2 signal was detected in nearly all niches of hair follicles including the CD34-positive cells.

      In Fig.S1 we demonstrated an increased level of TLR2 in the late (competent) telogen compared to the early (refractory) telogen using immunostaining for TLR2-GFP. The results mirrored published RNA-array data in Fig.1D. Again, reporter and immunostaining results have been validated by qPCR for TLR2.

      The levels of TLR2 might be heavily influences by the environment, i.e. pathogens availability. In this regard, note that mice for this study were kept in normal, not pathogen-free conditions.

      “Overall, the existence of Tlr2 expression in HFSCs is still questionable. Without resolving these, genetic deletion of Tlr2 in HFSCs cannot be rationalized.”

      In our manuscript, TLR2 expression is documented not only in Fig.1, but also in Fig.2 and S.Fig.1. We utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. Besides these data, we show the functional responses to canonical TLR2 ligand, PAM3CSK4, and previously characterized endogenous ligand, CEP, using proliferation, western blotting and many other approaches. In numerous immunostainings we show co-localization of TLR2 and CD34 (Fig.2) using IMARIS surface rendering and colocalization tools. Our conclusions are further supported by published results as discussed above.

      2) “The central conclusion of this study is that the activation of TLR2 can suppress BMP signaling; however, the molecular link between TLR2 and BMP signaling is still missing. Given the importance of this finding, it would be intriguing to further investigate how TLR2 activation suppresses BMP signaling. A better characterization of the molecular-level interaction between TLR2 and BMP signaling can further enhance the impact of this study.

      -The published dataset should be re-analyzed, as some images and their quantification do not appear to be matched. Representative images should be used.”“In Fig 4, the authors propose that the activation of TLR2 pathway inhibits the BMP signaling pathway, which makes HFSCs quiescent. In TLR2-HFSC-KO, the authors showed that BMP7 is increased and pSMAD1/5/9 is sustained. The increase in BMP7 expression and SMAD activation should be demonstrated by additional assays. Are SMAD target genes activated in the cKO mice?”

      This mechanistic link between TLR2 and BMP was originally identified by RNAseq, confirmed by qPCR and then by immunostaining for both, BMP7 and BMP pathway activation based on phosphoSMAD1/5/9 levels. The connection to BMP pathway was also shown by western blotting (S.Fig.4B,C). The rescue experiments have been performed using Noggin injections. According to our data, numerous SMAD target genes are upregulated in TLR2-HFSC-KO, such as Kank2, Ptk2b, Scarf2, Camk1, Dpysl2, as well as BMP2 and BMP7, and these changes were confirmed by qPCR analysis in Fig.4E. Additional evidence is shown in Fig.6, which demonstrates that endogenous TLR2 ligand, CEP-carboxyethylpyrrole, acts by a similar, BMP-dependent pathway. Also, Supplemental Fig.4 adds more details to this link. SFig.4B,C shows that TLR2 activation by canonical ligand PAM3CSK4 inhibits pSMAD levels induced by BMP (western blot is shown). At the same time, as anticipated PAM3CSK4 upregulated NFkB, however, little of no effect of BMP stimulation on NFkB is observed. To summarize: TLR2 affects both, BMP7 production and BMP induced downstream signaling judged by PhosphoSMADs. The later connection appears to go in one direction: TLR2 signaling affects BMP-induced pSMADs, however, BMP signaling does not seem to substantially change TLR2-dependent NFkB. We plan to delve into the intersection of these important pathways in future.

      “Functionally, downregulation of BMP signaling by injecting Noggin, a BMP antagonist, in TLR2HFSC-KO mice induces HFSC proliferation. These functional data are solid. However, it is still curious how TLR2 signaling interact with BMP pathway molecularly. Is it transcriptional regulation or translational regulation? Perhaps, RNA-seq analysis of TLR2HFSC-KO could give some hints to answer this question. Furthermore, checking out other signaling pathways such as WNT/LEF1 and pCREB, which are important for hair cycle activation and NFkB, a downstream effector of TLR signaling would be helpful to interrogate mechanistic insights.”

      As discussed above, TLR2 affects both, BMP7 production and BMP-induced downstream signaling judged by PhosphoSMADs. The later connection appears to go in one direction: TLR2 signaling affects BMP-induced pSMADs, however, BMP signaling does not seem to substantially change TLR2-dependent NFkB.

      Indeed, in addition to BMP signaling, the Wnt signaling and β-catenin stabilization within HFSCs, known to trigger their activation (Deschene et al., 2014). However, this axis remained unchanged upon TLR2HFSC-KO (as shown in Supplementary Fig. 4J). There were several published reports on the crosstalk between TLR and BMP signaling such as (doi: 10.1089/scd.2013.0345. Epub 2013 Nov 7) showing that activation of TLR4 inhibits BMP-induced pSMAD1/5/8 and this connection requires NFkB. We probed NfkB activation, please, see the responses above.

      However, we were not able to detect substantial effect of NFkB inhibition on BMP signaling in hair follicles (not shown).

      3) “The function of CEP, a proposed endogenous ligand of TLR2, is still not clear. The authors imply that the decreased CEP level in aged mice could lead to deficient TLR2 signaling, which could further cause aging-associated hair regeneration defects. But this has not been demonstrated. What are the BMPs and pSmad1/5 levels in aged skin? Another important experiment to confirm the importance of this link during aging would be to inject CEP into the aged skin and examine whether this could restore hair regeneration in aged mice. Does CEP activate hair cycling during the endogenous pathway? What might be the source of CEP? Does CEP treatment activate BMP7 signaling? The authors should clarify these issues. The authors suggested that CEP is an endogenous ligand of TLR2, and administration of CEP induces hair cycle entry in a TLR2dependent manner. How potent is CEP in terms of HFSC activation? In Fig 6Q, CEP increases the expression of Nfkb2, Il1b, and Il6, but the fold changes are marginal. Also, if CEP is a critical ligand, the loss of CEP by a genetic deletion or a pharmacological inhibition should result in the delay of hair cycle entry. Furthermore, the source of CEP expression is curious. Is it expressed by HFSCs or dermal fibroblast or immune cells? Finally, comparing the effect of CEP to the effect of other bacterial origin Tlr2 ligands such as heat killed bacteria, purified microbial cell-wall components, and synthetic agonists (Pam3CSK4) would be helpful. It is curious if HFSC directly senses the bacterial materials and triggers hair follicle regeneration or are indirectly directed by immune cells and endothelial cells, which could be primary sensor.”

      CEP is not a protein, it is an oxidative stress-generated metabolite of polyunsaturated fatty acid, DHA (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5360178/), thus, it is impossible to generate a knockout of this molecule. As demonstrated in previous publications (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990914/, https://pubmed.ncbi.nlm.nih.gov/34871763/) CEP serves as a critical endogenous ligand supporting TLR2 signaling in the absence of pathogens. While other TLR2 endogenous ligands, such as HMGBs or HSPs exist (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4373479/), CEP binds to TLR2 directly, and its generation is aided by MPO (myeloperoxidase) amongst other peroxidases and sources of reactive oxygen/nitrogen species. MPO (produced by immune cells amongst others) serves as an innate immunity response against pathogens, but it also generates CEP adducts (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034644/) adducts in both protein and lipid form. The knockout of MPO diminishes CEP generation in skin (PMC6034644), thereby demonstrating the causative relationship between CEP and MPO.

      Author response image 2.

      Additional immunostaining of mouse skin for Keratin 17 (purple), CEP (green) and MPO (red). Similar staining is in S.Fig.5A and quantification is in S.Fig.5B.

      Also, the above-mentioned manuscripts show that CEP effects are milder but overall comparable with canonical TLR2 agonists, PAM3SCK4. As we mention in the present manuscript, normal young mice’s tissues are devoid of CEP (which is generated in response to inflammation) with an exception of hair follicles. This is likely attributed to the secretion of MPO by hair follicles (PMID: 36402231) especially in conditions of inflammation (PMID: 32893875). Supplementary Fig.5A,B show that MPO is present at the high level in sebaceous gland (as a part of anti-microbial mechanism). Again, MPO is a secreted enzyme and it is likely to be a source of continuous DHA oxidation into CEP in hair follicles. We also document that both, TLR2 and CEP levels in hair follicles (but not in other tissues-an important point for CEP) are reduced in aging. Likewise, SFig.5A,B shows that MPO secretion in hair follicle is reduced by more than 60% in aging mice. Thus, it is likely that reduced MPO levels in aging hair follicle produce less CEP. Together with reduced TLR2 levels, the lack of CEP might contribute to hair loss in aging.

      We show that similar to TLR2, CEP in hair follicles operates via a BMP-7 dependent pathway (see Fig.6). We also provide results using canonical bacterial ligand for TLR2, PAM3CSK4 whose effect on HFSCs proliferation is similar to CEP in a TLR2-dependent manner. TLR2 blocking approaches were used (Supp. Fig.4B, C, D, E, Supp. Fig.5D-5F). It remains to be seen whether CEP is required for the normal hair cycling and whether its administration might improve hair loss in aging subjects.

      “The impacts of CEP/TLR2 on proliferation of keratinocytes is still weak. How much of this effect is a result of NFkB activation, and how much is simply due to inhibiting BMP signaling?

      Impact of TLR2 on proliferation was demonstrated using a variety of mouse models, from global TLR2 KO to bone marrow chimeras to HFSCs-specific TLR2 KO, again using multiple approaches. The same applies to the effects of CEP as well as to canonical TLR2 ligand, PAM3CSK4, which were demonstrated both in vivo and in culture to be TLR2-dependent (Fig.6MO) and Supplementary Fig.4E-D). As for NFkB connection, see our responses above. It seems that the connection between TLR2 and BMP pathway occurs independently of NFkB activation.

      4) The links between TLR2 pathway and aging and obesity are only correlative. Although the authors suggest that the reduction of TLR2 expression in aging and obesity may diminish hair growth (Fig 1), there is no direct functional evidence that supports this possibility. If the authors wish to make this claim, they should test the roles of TLR2 and CEP in aging and obesity conditions.”

      We show that both, TLR2 and CEP are reduced in aging, and that this pathway contributes to hair cycling and regeneration upon wounding, we do not wish to claim more.

      5) More minor points:

      “Fig.4: The Noggin treatment in TLR2 KO mice is an important experiment. However, it is unclear why Noggin only enhances proliferation (Ki67 level) in HG but not in the bulge. This discrepancy should be addressed.”

      As we showed in Fig. 3B-3F, TLR2 HFSC-KO mice have prolonged first telogen. Noggin treatment at the first postnatal telogen promotes telogen to anagen transition in TLR2HFSC-KO characterized by the activation of HG cells prior to the bulge cells. According to the literature, the bulge cells remained silent during the late telogen, however, HGs became Ki67- positive and the proliferation of HG cells contributed to the telogen-to-anagen transition.

      (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2668200/

      https://www.sciencedirect.com/science/article/pii/S0022202X15404518?via%3Dihub

      https://journals.biologists.com/jcs/article/114/19/3419/34892/Hair-follicle-predetermination).

      “Fig.5: Does TLR2 cKO slow down wound healing, in addition to affecting pigmentation and the number of hair follicles?”

      In our previous publication, we demonstrated that deletion of TLR2 in HFSC does not affect wound healing process. Instead, endothelial TLR2 promotes wound vascularization and healing.

      (see Xiong and all. Timely Wound Healing Is Dependent on Endothelial but Not on Hair Follicle Stem Cell Toll-Like Receptor 2 Signaling.// Journal of Investigative Dermatology, Volume 142, Issue 11, November 2022, Pages 3082-3092.e1).

      “There is no panel B in Fig.4. There is no image in Fig 4D. Please correct this properly.”

      We corrected Fig.4

      “Discussion: The constant production of CEP in homeostatic skin and in the absence of inflammation should be further discussed. Additionally, the possible causes of reducing CEP levels during aging should also be further discussed.”

      We explained the sources of CEP generation, such as MPO as a one of the key enzyme, above.<br /> The data on MPO levels in hair follicles of young and old mice are presented in Supplementary Fig.5A,B. Since we previously shown that MPO produces CEP from DHA (PMC6034644), the reduction in MPO in aging is likely to contribute to reduced CEP levels.

    1. Author Response

      We are grateful to the three reviewers and the editors who have provided comments about our manuscript, "Formation of malignant, metastatic small cell lung cancers through overproduction of cMYC protein in TP53 and RB1 depleted pulmonary neuroendocrine cells derived from human embryonic stem cells.”

      We are pleased that the reviewers recognized the importance of the problem we have addressed – namely, the need for better models of small cell lung cancer, a relatively common and refractory cancer. We also appreciate their acknowledgement of the significance of our major finding: that addition of an efficiently expressed CMYC transgene to neuroendocrine cells derived from human embryonic stem cells in which the RB1 and TP53 genes have been suppressed serves to drive aggressive growth and metastatic spread, rendering this system an appealing one for future studies of this recalcitrant cancer. Further, we acknowledge that more work needs to be done to more fully characterize and better understand the mechanistic features of this model system and to exploit it for therapeutic purposes.

      More specifically, we agree with the reviewers that this manuscript would be stronger if it included: (i) tests of other oncogenes, especially other members of the MYC gene family, to serve as drivers of tumor growth and metastasis and tests of orthotropic implantation of cells into the lung; (ii) descriptions of how such tumors with various genotypes respond to therapeutic approaches, both established and novel; and (iii) a more complete assessment of the contribution of abundant MYC proteins to physiological changes in tumor cells, such as growth, apoptosis, and invasion.

      While we wish we could provide such information, it is unrealistic to believe that it will be generated by the current constellation of authors in the foreseeable future. Data in the present manuscript has been generated over nearly five years, mostly in the early phases of that interval. Since then, some of us have moved from one institution to another, and some have shifted the focus of our studies. Further delays in publishing the main messages in this paper will only delay the pursuit of further studies, most likely by others. Indeed, one of the strongest justifications for the novel publication policies at eLife is to return control of the time for dissemination of results to the hands of the authors. Our situation illustrates the wisdom of that approach.

      We also note that the reviewers have raised a few issues that we aim to clarify by revisions of the current manuscript, thereby creating an improved Version of Record, within the next few weeks. We acknowledge here the significance of those issues and the ambiguities noted by the reviewers.

      The issues include the following point noted by more than one reviewer: our claim that expression of the CMYC oncogene increases the neuroendocrine character of the tumors. We recognize that this observation may be influenced by the nature of the analysis (single cell or bulk RNA sequencing), the choice of lineage markers (eg, NEUROD1 or ASCL1 or others), and the statistical evaluation of the data. We will review these aspects of the problem and make appropriate changes in the text to be submitted as the Version of Record.

      Reviewer 1 also makes a good point about the possible effects of CMYC on the differentiation of hESC-derived lung progenitors (LPs). In this paper, we examine this issue only in LPs in which the tumor suppressor genes, RB1 and TP53, have been suppressed. Further studies of the effect of CMYC on differentiation of LPs with various combinations of functional tumor suppressor genes might well prove valuable in exploring the origins of SCLC.

      Finally, we wish to note that a topic discussed by Reviewer 1 (and by us) about the still poorly understood relationship between cancer genotypes and cell lineages has been partially addressed in a paper from our group that has been accepted for publication in Science.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      1) A single biomarker seems very unlikely to be of much help in the detection of glaucoma due to the etiological heterogeneity of the disease, the existence of different subtypes, and the genetic variability among patients. Rather, a panel of biomarkers may provide more useful information for clinical prediction, including better sensitivity and specificity. The inclusion of additional metabolites already identifying in the study, in combination, may provide more reliable and correct assignment results.

      The authors’ answer: Thank you for your comment. We recognize the constraints of using single biomarkers for diagnosis. In upcoming research, we aim to incorporate multiple biomarkers to improve diagnostic accuracy and will consider adding more metabolites as suggested.

      2) The number of samples in the supplementary phase is low, larger sample sizes are mandatory to confirm the diagnostic accuracy.

      The authors’ answer: Thank you for your comment. Collecting aqueous humor is invasive, making samples scarce. We acknowledge the small sample size limitation. In future studies, we plan to use larger samples to verify the biomarker's diagnostic accuracy. Your feedback emphasizes the need for thorough validation in our next research

      3) Cohorts from different populations are needed to verify the applicability of this candidate biomarker.

      The authors’ answer: Thank you for the suggestion. We agree on the need to test the biomarker's relevance across varied populations. Reports from other groups will help confirm and broaden our results.

      4) Sex hormones seem to be associated also with other types of glaucoma, such as primary open-angle glaucoma (POAG), although the molecular mechanisms are unclear (see doi:10.1167/iovs.17-22708). The inclusion of patients diagnosed with other subtypes of glaucoma, like POAG, may contribute to determining the sensitivity and specificity of the proposed biomarker. Androstenedione levels should be determined in POAG, NTG, or PEXG patients.

      The authors’ answer: I agree with your comment and thank you for your suggestion. PACG is a major cause of irreversible blindness in Asians. While this study centers on PACG, the link between sex hormones and other glaucoma subtypes, like POAG, merits investigation. Future studies will include POAG and other subtypes to further assess androstenedione's diagnostic relevance.

      5) In addition, the levels of androstenedione were found significantly altered during other diseases as described by the authors or by conditions like polycystic ovary syndrome, limiting the utility of the proposed biomarker.

      The authors’ answer: Thank you for your advice. Androstenedione levels also change in conditions like polycystic ovary syndrome, which could affect the biomarker's specificity. We plan to further study androstenedione's unique changes in glaucoma versus other conditions to clarify its diagnostic value.

      6) Uncertainty of the androstenedione levels compromises its usefulness in clinical practice.

      The authors’ answer: The uncertainty surrounding androstenedione levels and its impact on clinical applicability is a valid concern. We plan to delve deeper into understanding the variability and determinants of androstenedione levels to better assess its clinical relevance.

      Reviewer #2 (Public Review):

      The "predict" part is on much less solid ground. The visual field progression and association with serum androstenedione within the current experimental design eludes to a correlation. It truly cannot be stated as predictive. To predict one needs to put the substance when nothing is there and demonstrate that the desired endpoint is reached. Conversely, the substance (androstenedione) can be removed, and show that the condition regresses. None of these are possible without model system experiments, which have not been done. The authors could put some additional details in the methods, such as: 1) how much sample was collected, 2) whether equal serum volume for analysis had equal serum proteins (or cells). They have used a LC-MS/MS and a Chemiluminescence method, but another independent method such as GC-MS/MS or NMR to detect androstenedione for a subset of patients with different stages of visual field defect would be desirable.

      The authors’ answer: We acknowledge your constructive critique concerning our use of the term "predict". In the present study, we elucidated a discernible correlation between visual field progression and serum androstenedione concentrations. We are cognizant of the critical distinction between correlation and causation, and we concur that our application of the term “predict” may have been overly assertive in this context.

      Your emphasis on the imperative of employing model system experiments to unequivocally ascertain causative relationships is well-received. The experimental approach of modulating the substance, androstenedione in this case, to empirically observe its consequential impact on the condition, is a pivotal direction that warrants exploration in subsequent research endeavors. With regard to the variability of serum protein concentrations across participants, we adopted a methodological standardization by ensuring that the analyzed serum volume remained consistent across samples. This was implemented to enhance the reliability and generalizability of our findings.

      Your recommendation to consider alternative detection methodologies, specifically GC-MS/MS or NMR, is duly noted. Although our choice of LC-MS/MS and Chemiluminescence was predicated on available resources, we recognize the scientific merit in leveraging multiple analytical techniques. In future investigations, we endeavor to incorporate a broader spectrum of detection methodologies for androstenedione, particularly when assessing patients with varied visual field defect stages, thereby bolstering the robustness and validity of our conclusions.

      Reviewer #1 (Recommendations for The Authors):

      1) POAG is the leading cause of irreversible blindness worldwide (see reference #4). The prevalence of PACG is highest in Asia, but the major form of glaucoma is still POAG. The authors should modify the abstract and background sections accordingly (see line 30 and lines 61-62).

      The authors’ answer: Thank you for your suggestion, and we apologize for this mistake. The sentence” Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness worldwide” has been changed to” Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness in Asia”. (Page 2, lines 33; Page 3, lines 62-64)

      2) Line 69, please change the sentence "the He et al. taught us..." to the following "the He et al. study taught us.".

      The authors’ answer: Thank you for your comment. The sentence "the He et al. taught us..." has been changed to "the He et al. study taught us.". (Page 3, lines 72)

      3) I suggest including the name of the identified candidate biomarker in the title of the manuscript. The title must be straightforward.

      The authors’ answer: We agree with your comment and thank you for your suggestion. The sentence “Metabolomics Identifies and Validates Serum Novel Biomarker for Diagnosing Primary Angle Closure Glaucoma and Predicting the Visual Field Progression” has been changed to “Metabolomics Identifies and Validates Serum Androstenedione as Novel Biomarker for Diagnosing Primary Angle Closure Glaucoma and Predicting the Visual Field Progression”. (Page 1, lines 1)

      4) Line 88, please change "normal subjects" to "control individuals".

      The authors’ answer: Thank you for your comment. We have changed "normal subjects" to "control individuals”. (Page 4, lines 91)

      5) Line 95 and so on along the manuscript, avoid the term "normal controls" or "normal" and use only the term "controls".

      The authors’ answer: Thank you for your advice. "normal subjects" has been changed to "controls". (Page 4, lines 113; Page5, lines 118,120,124,128,133)

      6) In the participants section, indicate the ocular treatments of PACG patients. For example, on line 141, which "treatment" are you referring to?

      The authors’ answer: Thank you for your comment. We apologize to this vague statement. Treatment included medical treatment and surgical treatment. We have revised it in the manuscript. (Page 5, lines 142)

      7) The entire section 2.4 is confusing. According to Figure S2, untargeted metabolomics was conducted with a mixed sample containing "all" serum extracts in order to obtain an in-house database with molecular features present in serum by LCHRMS. Then, this database was used for targeted metabolomics in individual serum samples using LCQQQ. However, as it is described in the manuscripts, it seems that first, an untargeted metabolomics analysis was carried out to identify altered metabolites, then targeted metabolomics was carried out to validate the untargeted analysis and finally, a profiling analysis was carried out to construct the database. The workflow must be clearly discussed and amended to be understable.

      The authors’ answer: Thank you for your comment. We have revised the description of the experimental method section 2.4. (Page 7, lines 195-198)

      8) Please, briefly explain what widely-targeted metabolomics is and how it works in this study (see section 2.4).

      The authors’ answer: Thank you for your comment. For extensively targeted metabolome detection, a local database was first established by using the standard database, and ion pair information was obtained by scanning ion pairs of mixed samples (QC) with QTOF. A wide range of metabolites were qualitatively obtained by comparing with the local self-built database, and then the metabolites of each sample were qualitatively and quantitatively measured by MRM scanning mode of triple four-bar QQQ. This project combines the non-target public database scanning construction database and the wide target local database to build a new database, and then scans the database of the samples of this project with Q-TOF, and then carries out the qualitative and quantitative detection of metabolites of each sample in MRM mode. (Figure S2)

      9) On Table 1, indicate the number of patients and controls with cataracts.

      The authors’ answer: For the glaucoma group and the control group, we have excluded people with cataracts. This section is described in the inclusion and exclusion criteria for supplementary materials. (Inclusion and exclusion criteria)

      10) On "Sample processing" section, lines 152 and 153: Have you used cold methanol to ensure metabolic quenching? If not, how metabolite quenching was carried out?

      The authors’ answer: Thank you for your comment. We use cold methanol to extract metabolites, and the early blood samples have been stored in a -80°C refrigerator to ensure a low temperature process and ensure metabolic quenching. (Page 6, lines 196)

      11) On the same "Sample processing" section, have you used internal standards during metabolite extraction? If yes, ones? If not, why?

      The authors’ answer: Thank you for your comment. In the metabolite extraction process of each sample, the same internal standard was added, and the same volume of 50 μL serum samples were extracted. The specific internal label name has been added in "Sample processing" section. (Page 6, lines 153-155)

      12) Lines 161-163, I suggest including in the supplementary material the worklist of the entire experiment run by LC-MS, including analytical replicates and QCs.

      The authors’ answer: Thank you for your comment. Worklist for mass spectrometry can be found in supplementary sheet1. (Page 6, lines 165)

      13) The title of the section "Detection method" does not seem appropriate, please change it to "Analytical methods "or something similar.

      The authors’ answer: Thank you for your advice. "Detection method" has been changed to “Analytical methods “. (Page 6, lines 168)

      14) Section 2.4.1, I suggest changing "Untargeted detection conditions" to "Untargeted metabolomics analysis".

      The authors’ answer: Thank you for your comment. "Untargeted detection conditions" has been changed to "Untargeted metabolomics analysis". (Page 6, lines 169)

      15) Lines 170-172, the column used is compatible with 100% water, why start with 5% acetonitrile?

      The authors’ answer: Thank you for your comment. If the acetonitrile starting gradient is 0, it will cause a lot of water-soluble substances to elute and easily clog the column, so we want to use 5% organic phase.

      16) Section 2.4.1, the chromatographic conditions (mobiles phases) were the same in both positive and negative ion mode? It is desirable to change or adjust a basic pH when working in negative, so please amend and clarify it.

      The authors’ answer: Thank you for your comment. In the negative ion mode, the peak shape of the chromatogram under the acidic system is better than that under the alkaline system, so we choose the acidic system.

      17) I am not able to clearly understand what is "widely targeted conditions" (see section 2.4.2). What is the difference with the conventional targeted metabolomics analysis? In my view, widely-targeted metabolomics refers to the combination of untargeted metabolomics and targeted metabolomics. This must be clarified and simplified.

      The authors’ answer: Thank you for your syggestion. The characterization of metabolites in this study was conducted using a non-targeted database and a self-built database. Non-targeted metabolites were characterized with mixed samples, and then combined with the laboratory self-established database to form a new metabolome database for this study. 2.4.2 The broad targeting here refers to the use of the MWDB standard self-built database to characterize metabolites, and then the QQQ MRM model to quantify metabolites. In order to clearly describe the detection process, this part of the method has been modified. (Figure S2)

      18) Line 199, please, indicate the normalization carried out.

      The authors’ answer: We agree with your comment and thank you for your suggestion. The normalization description is missing from its data processing steps and has been corrected in the manuscript. (Page 7, lines 203)

      19) How many instrumental replicates have you carried out both in untargeted and targeted metabolomics? Please, indicate it.

      The authors’ answer: Thank you for your advice. In this project, all sample mixtures were used as QC samples, which were repeated several times in the testing process (one QC sample was inserted between every 10 samples), and the repeated correlation between repeated QC was more than 99% to ensure the stability of sample testing. (Sheet1)

      20) Line 267, why did you select a fold changes threshold greater than 1.15 (or lower 0.85)? In metabolomics, it would be desirable to have a minimum of 1.5-fold change considering the variability of data.

      The authors’ answer: Thank you for your comment. FC reduction is selected to expand potential candidate metabolites and can be repeated in three batches and refer to the literature "Blood metabolomics uncovers inflammation-associated mitochondrial dysfunction as a potential mechanism. underlying ACLF "method screening threshold.

      21) To include anywhere the molecular formula of androstenedione.

      The authors’ answer: I agree with your comment and thank you for your suggestion. We have added the molecular formula of androstenedione to the supplementary material. (Page 17, lines 475)

      22) Line 290 is not Figure 4B and 4C, you may refer to Figure 3B and 3C.

      The authors’ answer: Thank you for your advice. We apologize to this mistake. Figure 4B and 4C have been changed to Figure 3B and 3C.

      23) Figure S3 was lost from Supplementary material, please include it.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. There is an error in the ordering of the supplementary graph. Figure 3 is redundant, and we have modified it in the supplementary materials.

      24) Figure 4 B, indicate in the text the average and uncertainty of androstenedione levels in both control and PACG groups.

      The authors’ answer: Thank you for your comment. In the manuscript, We have added descriptions of mean ± standard deviation of androstendione levels in the control group and the disease group. (Page 11, lines 311-312)

      25) Section 3.6. please include the average and uncertainty of androstenedione levels in males and females in both control and PACG groups.

      The authors’ answer: Thank you for your advice. For 3.6 section, we supplemented the mean ± standard deviation of androstenedione levels in the control and disease groups. (Page 13, lines 350-356)

      26) Figure S9 seems missing.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. Figures S9 has been added in the Supplementary material.

      27) Lines 345-346, indicate the levels obtained for the metabolite in the compared groups.

      The authors’ answer: Thank you for your suggestion. The levels of androstenedione in each group are seen in “The results from both discovery set 1 (Figure S9A, Mild:32600±17011, Moderate:33215±17855, Severe:46060±21789) and discovery set 2 (Figure S9B, Mild:27866±19873, Moderate:27057±13166, Severe:43972±19234) indicated that the mean serum androstenedione levels were significantly higher in the severe PACG group compared to the moderate and mild PACG groups (P<0.001). These findings were further validated in both validation phase 1 (Figure S9C, Mild:75726±45719, Moderate:65798±30610, Severe:94348±30858) and validation phase 2 (Figure S9D, Mild:1.121±0.3143 ng/ml, Moderate:1.461±0.4391 ng/ml, Severe:2.147±0.6476 ng/ml).” and “Notably, the level of androstenedione was found to be significantly higher in PACG patients than in normal subjects in both discovery set 1 (Figure 4B, P=0.0081, Normal:33987±11113, PACG:42852±20767) and discovery set 2 (Figure 4C, P=0.0078, Normal:31559±10975, PACG:37934±18529).”

      28) Line 368, you don't need to indicate the PACG abbreviation again.

      The authors’ answer: Thank you for your comment. We apologize to this mistake. I have changed " patients with PACG " to "patients". (Page 13, lines 377)

      29) Figure 6, panels A and B are not labeled (i.e., commented) in the body text of the manuscript.

      The authors’ answer: Thank you for your suggestion. We’re very sorry for this mistake. Figure 6, panels A and B have been labeled in the manuscript. (Page 13, lines 377-379)

      30) Section 3.7., when you indicate "after therapy" are you referring to surgical treatment? Please, clarify.

      The authors’ answer: Thank you for your comment. We apologize to this vague statement. Blood samples were taken before and three months after surgery. “therapy” has been changed to “surgical treatment” in the manuscript. (Page 13, lines 377)

      31) Line 370, "97th patient" should be replaced by "nine patients"?

      The authors’ answer: Thank you for your advice. We apologize to this mistake. "97th patient" has been changed to “nine patients". (Page 13, lines 378-379)

      32) Lines 370-372, it difficult to understand, please clarify why these findings indicate that severity is related to increased PACG according to Figure 6B.

      The authors’ answer: Thank you for your comment. We’re very sorry for this vague statement. The sentence of “These findings showed that the levels of androstenedione that were tightly connected with PACG severity rose dramatically as PACG progressed.” Has been removed.

      33) Line 447, the word "corrected" should be changed to "correlated"?

      The authors’ answer: Thank you for your comment. "corrected" has been changed to "correlated". (Page 16, lines 453,456)

      34) According to the literature, the levels found in control subjects are within the range of the "normal" values, i.e., are they comparable?

      The authors’ answer: Thank you for your advice. Androstenedione ranges from 0.4 to 2 in the normal population. The mean standard deviation of androstenedione in the normal population was 1.552 ± 0.4859.

      35) Lines 471-474, why "steroid hormone biosynthesis appears to be the critical node to high-match PACG pathophysiological concepts" while the high enrichment was observed in the "metabolic pathways"?

      The authors’ answer: Metabolic pathways encompass a series of chemical reactions within a cell that enable the synthesis or breakdown of molecules to maintain the cell's energy balance. Steroid hormone biosynthesis is one of these metabolic pathways, and its products, steroid hormones, participate in a wide range of physiological processes, including metabolism, immune response, and the regulation of inflammation. In a different context, a study related to fatigue during Androgen Deprivation Therapy (ADT) showed a significant difference in metabolite levels within the steroid hormone biosynthesis pathways, emphasizing the role these pathways play in metabolic alterations. The mentioned findings suggest that steroid hormone biosynthesis and metabolic pathways are intertwined. (Page 17, lines 481-488)

      36) Figure S13 and Figure S14A are the same.

      The authors’ answer: Thank you for your comment. Figure S14A has been removed.

      37) On lines 476-485, it would be interesting to discuss whether alterations of this metabolite could be a cause or consequence of PACG.

      The authors’ answer: Based on the literature found, androstenedione is a naturally occurring steroid hormone produced by the gonads and adrenal glands, and serves as an intermediate in testosterone biosynthesis (Androstenedione (a Natural Steroid and a Drug Supplement): A Comprehensive Review of Its Consumption, Metabolism, Health Effects, and Toxicity with Sex Differences). Early events in the pathobiology of glaucoma involve oxidative, metabolic, or mechanical stress acting on retinal ganglion cells (RGCs), leading to their rapid release of danger signals such as extracellular ATP, thus triggering microglial and macroglial activation as well as neuroinflammation (Immune Responses in the Glaucomatous Retina: Regulation and Dynamics). However, one might speculate that since androstenedione is a steroid hormone, it could potentially impact the inflammatory and metabolic stress observed in the pathophysiological processes of glaucoma (Adaptive responses to neurodegenerative stress in glaucoma). Metabolic and anti-inflammatory avenues might be crucial in understanding the relationship between alterations in androstenedione levels and the severity of glaucoma. Nevertheless, more research and literature analysis would be necessary to better understand the precise relationship and its underlying mechanisms between these two entities.

      38) I suggest sending the MS and MS/MS into a publicly available repository.

      The authors’ answer: Thank you for your suggestion. Further research will necessitate the utilization of the raw mass spectrometry data. We anticipate making this raw data available in a public repository upon the conclusion of subsequent experiments.

      Reviewer #2 (Recommendations for The Authors):

      The authors should aim to describe methods in greater detail.

      The authors could improve the writing to accurately describe their results and their interpretation and state what else could be done to make the result truly "predictive".

      The authors’ answer: (1) Detail Enhancement in the Methods section: We expand the description of methods such as sample pre-processing, mass spectrometry detection, and result analysis in the study to provide more detailed information about the procedures, equipment, and materials used. (2) Improvement in Writing Quality: We have engaged a scientific editor to review our manuscript for clarity, coherence, and consistency to ensure that the results and interpretations are accurately and clearly conveyed. Terminologies and phrases have been revised to better reflect the findings and interpretations. (3) Limitation supplement: We have included a discussion on the limitations of our study and suggested additional studies and analyses that could be conducted to enhance the predictive value of our findings. We sincerely appreciate the constructive feedback from the reviewer, which has greatly contributed to improving the quality and rigor of our manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Issue 1: The relevance is somewhat unclear. High cysteine levels can be achieved in the laboratory, but, is this relevant in the life of C. elegans? Or is there physiological relevance in humans, e.g. a disease? The authors state "cells and animals fed excess cysteine and methionine", but is this more than a laboratory excess condition? SUOX nonfunctional conditions in humans don't appear to tie into this, since, in that context, the goal is to inactivate CDO or CTH to prevent sulfite production. The authors also mention cancer, but the link to cysteine levels is unclear. In that sense, then, the conditions studied here may not carry much physiological relevance.

      Response 1: We set out to answer a fundamental question: what pathways regulate the function of cysteine dioxygenase, a highly conserved enzyme in sulfur amino acid metabolism? In an unbiased genetic screen that sampled millions of EMS generated mutations across all ~20,000 C. elegans genes, we discovered loss of function/null mutations in egl-9 and rhy-1, two negative regulators of the hypoxia inducible transcription factor (hif-1). Genetic ablation of the egl-9 or rhy-1 loci are likely not relevant to the life of a C. elegans animal, i.e. this is not representative of a natural state. Yet, this extreme genetic intervention has taught us a new fundamental truth about the interaction between EGL-9/RHY-1, HIF-1, and the transcriptional activation of cdo1. Similarly, the high cysteine levels used in our assays may or may not be representative of a state in nature, we do not know (nor do we make any claims about the environmental relevance of our choice of cysteine concentrations). It seems very plausible that pathological states exist where cysteine concentrations may rise to comparable levels in our experimental system. More importantly, we have started with excess to physiology to elicit a clear response that we can study in the lab. Similar strategies established the cysteine-induction phenotype of CDO1 in mammalian systems. For instance, in Kwon and Stipanuk 2001, hepatocytes are cultured in media supplemented with 2mmol/L cysteine to promote a ~4-fold increase in CDO1 mRNA.

      Issue 2: The pathway is described as important for cysteine detoxification, which is described to act via H2S (Figure 6). Much of that pathway has already been previously established by the Roth, Miller, and Horvitz labs as critical for the H2S response. While the present manuscript adds some additional insight such as the additional role of RHY-1 downstream on HIF-1 in promoting toxicity, this study therefore mainly confirms the importance of a previously described signalling pathway, essentially adding a new downstream target rhy-1 -> cysl-1 -> egl-9 -> hif-1 -> sqrd-1/cdo-1. The impact of this finding is reduced by the fact that cdo-1 itself isn't actually required for survival in high cysteine, suggesting it is merely a maker of the activity of this previously described pathway.

      Response 2: We agree that the primary impact of our manuscript is the establishment of a novel intersection between the H2S-sensing pathway (largely worked out by Roth, Miller, and Horvitz) and our gene of interest, cysteine dioxygenase. We believe that the connection between these two pathways is exciting as it suggests a logical homeostatic circuit. High cysteine yields enzymatically produced H2S. This H2S may then act as a signal promoting HIF-1 activity (via RHY-1/CYSL-1/EGL-9). High HIF-1 activity increases cdo-1 transcription and activity promoting the degradation of the high-cysteine trigger. As pointed out by the reviewer, cdo-1(-) loss of function alone does not cause cysteine sensitivity at the concentrations tested. Given that cysl-1(-) and hif-1(-) mutants are exquisitely sensitive to high levels of cysteine, we propose that HIF-1 activates the transcription of additional genes that are required for high cysteine tolerance. However, our genetic data show that cdo-1 is more than simply a marker of HIF-1 transcription. Our genetic data in Table 1 demonstrate that HIF-1 activation (caused by egl-9(-)) is sufficient to cause severe sickness in a suox-1 hypomorphic mutant which cannot detoxify sulfites, a critical product of cysteine catabolism. This severe sickness can be reversed by inactivating hif-1, cth-2, or cdo-1. These data demonstrate a functional intersection between the established H2S-sensing pathway and cysteine catabolism governed by cdo-1.

      Reviewer #2 (Public Review):

      Issue 3: First, the authors show that the supplementation of exogenous cysteine activates cdo-1p::GFP. Rather than showing data for one dose, the author may consider presenting dose-dependency results and whether cysteine activation of cdo-1 also requires HIF-1 or CYSL-1, which would be important data given the focus and major novelty of the paper in cysteine homeostasis, not the cdo-1 regulatory gene pathway.

      Response 3: We agree with the reviewer and have performed the suggested dose-response curve for expression of Pcdo-1::GFP in wild-type C. elegans. We observe substantial activation of the Pcdo-1::GFP transcriptional reporter beginning at 100µM supplemental cysteine (Figure 3C). Higher doses of cysteine do not elicit a substantially stronger induction of the Pcdo-1::GFP reporter. Thus, we find that 100µM supplemental cysteine strikes the right balance between strongly inducing the Pcdo-1::GFP reporter while not inducing any toxicity or lethality in wild-type animals (Figure 3E).

      We further agree that testing for induction of the Pcdo-1::GFP reporter in a hif-1(-) or cysl-1(-) mutant background is a critical experiment. However, we have not been able to identify a cysteine concentration that induces Pcdo-1::GFP and is not 100% lethal for hif-1(-) or cysl-1(-) mutant C. elegans. The remarkable sensitivity of hif-1(-) or cysl-1(-) mutant C. elegans to supplemental cysteine demonstrates the critical role of these genes in promoting cysteine homeostasis. But because of this lethality, we could not assay the Pcdo1::GFP reporter in the hif-1(-) or cysl-1(-) mutant animals. But the lethality to excess cysteine demonstrates that this cysteine response is salient. To get at how cysteine might be interacting with the HIF-1-signaling pathway, we performed new additivity experiments by supplementing 100µM cysteine to wild type, egl-9(-), and rhy-1(-) mutant C. elegans expressing the Pcdo-1::GFP reporter. Surprisingly, we found that cysteine had no significant impact on Pcdo-1::GFP expression in an egl-9(-) mutant background but significantly increased the Pcdo-1::GFP expression in a rhy-1(-) background (Figure 3A,B). These data suggest that cysteine acts in a pathway with egl-9 and in parallel to rhy-1. These data have been incorporated into Figure 3A,B and are included in the Results section of the manuscript.

      Issue 4: While the genetic manipulation of cdo-1 regulators yields much more striking results, the effect size of exogenous cysteine is rather small. Does this reflect a lack of extensive condition optimization or robust buffering of exogenous/dietary cysteine? Would genetic manipulation to alter intracellular cysteine or its precursors yield similar or stronger effect sizes?

      Response 4: We agree that the induction of the Pcdo-1::GFP reporter by supplemental cysteine is not as dramatic as the induction caused by the egl-9 or rhy-1 null alleles. We believe our Response 3 and new Figure 3C demonstrate that this phenomenon is not due to lack of condition optimization, but likely reflects some biology. As pointed out by the reviewer, C. elegans likely buffers exogenous cysteine and this (perhaps) prevents the impressive Pcdo-1::GFP induction observed in the egl-9(-) and rhy-1(-) mutant animals. We have now mentioned this possible interpretation in the Results section. Furthermore, we like the idea of using genetic tricks to promote cysteine accumulation within C. elegans cells and tissues and will consider these approaches in future studies.

      Issue 5: Second, there remain several major questions regarding the interpretation of the cysteine homeostasis pathway. How much specificity is involved for the RHY-1/CYSL-1/EGL-9/HIF-1 pathway to control cysteine homeostasis? Is the pathway able to sense cysteine directly or indirectly through its metabolites or redox status in general? Given the very low and high physiological concentrations of intracellular cysteine and glutathione (GSH, a major reserve for cysteine), respectively, there is a surprising lack of mention and testing of GSH metabolism.

      Response 5: Future studies are required to determine the specificity of the RHY-1/CYSL-1/EGL-9/HIF-1 pathway for the control of cysteine homeostasis. Our proposed mechanism, that H2S activates the HIF-1 pathway is based largely on the work of the Horvitz lab (Ma et al. 2012). They demonstrate that H2S promotes a direct inhibitory interaction between CYSL-1 and EGL-9, leading to activation of HIF-1. These findings align nicely with our genetic and pharmacological data. However, our work does not provide direct evidence as to the cysteine-derived metabolite that activates HIF-1. We propose H2S as a likely candidate.

      We have added a note to the introduction regarding the role of GSH as a reservoir of excess cysteine and agree that future studies might find interesting links between CDO-1, GSH metabolism, and HIF-1.

      Issue 6: In addition, what are the major similarities and differences of cysteine homeostasis pathways between C. elegans and other systems (HIF dependency, transcription vs post-transcriptional control)? These questions could be better discussed and noted with novel findings of the current study that are likely C. elegans specific or broadly conserved.

      Response 6: We have included a new section in the Discussion highlighting the nature of mammalian CDO1 regulation. We propose the hypothesis that a homologous pathway to the C. elegans RHY-1/CYSL-1/EGL9/HIF-1 pathway might operate in mammalian cells to sense high cysteine and induce CDO1 transcription. Importantly, all proteins in the C. elegans pathway have homologous counterparts in mammals. However, this hypothesis remains to be tested in mammalian systems.

      Reviewer #3 (Public Review):

      Major weaknesses of the paper include:

      Issue 7: the over-reliance on genetic approaches.

      Response 7: This is a fair critique. Our expertise is genetics. Our philosophy, which the reviewers may not share, is that there is no such thing as too much genetics!

      Issue 8: the lack of novelty regarding prolyl hydroxylase-independent activities of EGL-9.

      Response 8: We believe the primary novelty of our work is establishing the intersection between the H2Ssensing HIF-1 pathway and cysteine catabolism governed by cysteine dioxygenase. Our demonstration that cdo-1 regulation operates largely independent of VHL-1 and EGL-9 prolyl hydroxylation is a mechanistic detail of this regulation and not the critical new finding. Although, we believe it does suggest where pathway analyses should be directed in the future. We also believe that our homeostatic feedback model for the regulation of HIF-1 (and cdo-1) by cysteine-derived H2S is new and exciting and provides insight into the logic of why HIF-1 might respond to H2S and promote the activity of cdo-1. Our work suggests that one reason for this intersection of hif-1 and cdo-1 is to sense and maintain cysteine homeostasis when cysteine is in excess.

      Issue 9: the lack of biochemical approaches to probe the underlying mechanism of the prolyl hydroxylaseindependent activity of EGL-9.

      Response 9: While not the primary focus of our current manuscript, we agree that this is an exciting area of future research. To uncover the prolyl hydroxylase-independent activity of EGL-9, we agree that a combination of approaches will be required including, biochemical, structure-function, and genetic.

      Major Issues We Feel the Authors Should Address:

      Issue 10: One particularly glaring concern is that the authors really do not know the extent to which the prolyl hydroxylase activity is (or is not) impacted by the H487A mutation in egl-9(rae276). If there is a fair amount of enzymatic activity left in this mutant, then it complicates interpretation. The paper would be strengthened if the authors could show that the egl-9(rae276) eliminates most if not all prolyl hydroxylase activity. In addition, the authors may want to consider doing RNAi for egl-9 in the egl-9(rae276) mutant as a control, as this would support the claim that whatever non-hydroxylase activity EGL-9 may have is indeed the causative agent for the elevation of CDO-1::GFP. Without such experiments, readers are left with the nagging concern that this allele is simply a hypomorph for the single biochemical activity of EGL-9 (i.e., the prolyl hydroxylase activity) rather than the more interesting, hypothesized scenario that EGL-9 has multiple biochemical activities, only one of which is the prolyl hydroxylase activity.

      Response 10: We have two lines of evidence that suggest the egl-9(rae276)-encoded H487A variant eliminates prolyl hydroxylase activity. First, Pan et al. 2007 (reference 57) demonstrate that when the equivalent histidine (H313) is mutated in human protein, that protein lacks detectible prolyl hydroxylase activity. Second, the phenotypic similarities caused by egl-9(rae276) and the vhl-1 null allele, ok161. Both alleles cause nearly identical activation of the Pcdo-1::GFP reporter transgene (Fig. 5C,D), and similarly impact the growth of the suox-1(gk738847) hypomorphic mutant (Table 1). This phenotypic overlap is highly relevant as the established role of VHL-1 is to recognize the hydroxyl mark conferred by the EGL-9 prolyl hydroxylase domain and promote the degradation of HIF-1. If EGL-9[H487A] had residual prolyl hydroxylase activity, we would expect the vhl-1(-) null mutant C. elegans to display more dramatic phenotypes than their egl-9(rae276) counterparts. This is not the case.

      Issue 11: The authors observed that EGL-9 can inhibit HIF-1 and the expression of the HIF-1 target cdo-1 through a combination of activities that are (1) dependent on its prolyl hydroxylase activity (and subsequent VHL-1 activity that acts on the resulting hydroxylated prolines on HIF-1), and (2) independent of that activity. This is not a novel finding, as the authors themselves carefully note in their Discussion section, as this odd phenomenon has been observed for many HIF-1 target genes in multiple publications. While this manuscript adds to the description of this phenomenon, it does not really probe the underlying mechanism or shed light on how EGL-9 has these dual activities. This limits the overall impact and novelty of the paper.

      Response 11: See response to Issues #8.

      Issue 12: Cysteine dioxygenases like CDO-1 operate in an oxygen-dependent manner to generate sulfites from cysteine. CDO-1 activity is dependent upon availability of molecular oxygen; this is an unexpected characteristic of a HIF-1 target, as its very activation is dependent on low molecular oxygen. Authors neither address this in the text nor experimentally, and it seems a glaring omission.

      Response 12: We agree this is an important point to raise within our manuscript. Although, despite its induction by HIF-1, there is no evidence that cdo-1 transcription is induced by hypoxia. In fact, in a genome wide transcriptomic study, cdo-1 was not found to be induced by hypoxia in C. elegans (Shen et al. 2005, reference 71).

      We have newly commented on the use of molecular oxygen as a substrate by both EGL-9 and CDO-1 in our Discussion section. The mammalian oxygen-sensing prolyl hydroxylase (EGLN1) has been demonstrated to have high a Km value for O2 (high µM range). This likely allows EGLN1 to be poised to respond to small decreases in cellular oxygen from normal oxygen tensions. Clearly, CDO-1 also requires oxygen as a substrate, however the Km of CDO-1 for O2 is likely to be much lower, preventing sensitivity of the cysteine catabolism to physiological decreases in O2 availability. Although, to our knowledge, the CDO1 Km value for O2 has not been experimentally determined. We have added a new Discussion section where we address the conundrum about low oxygen inducing HIF-1 but oxygen being needed by CDO-1/CDO1.

      Issue 13: The authors determined that the hypodermis is the site of the most prominent CDO-1::GFP expression, relevant to Figure 4. This claim would be strengthened if a negative control tissue, in the animal with the knockin allele, were shown. The hypodermal specific expression is a highlight of this paper, so it would make this article even stronger if they could further substantiate this claim.

      Response 13: Our claim that the hypodermis is the critical site of cdo-1 function is based on; i) our hands on experience looking at Pcdo-1::GFP, Pcdo-1::CDO-1::GFP, CDO-1::GFP (encoded by cdo-1(rae273)) and our reporting of these expression patterns in multiple figures throughout the manuscript and ii) the functional rescue of cdo-1(-) phenotypes by a cdo-1 rescue construct expressed by a hypodermal-specific promoter (col10). We agree that providing negative control tissues would modestly improve the manuscript. However, we do not think that adding these controls will substantially alter the conclusions of the paper. Importantly, we acknowledge this limitation of our work with the sentence, “However, we cannot exclude the possibility that CDO-1 also acts in other cells and tissues as well.”

      Minor issues to note:

      Issue 14: Mutants for hif-1 and cysl-1 are sensitive to exogenous cysteine levels, yet loss of CDO-1 expression is not sufficient to explain this phenomenon, suggesting other targets of HIF-1 are involved. Given the findings the authors (and others) have had showing a role for RHY-1 in sulfur amino acid metabolism, shouldn't the authors consider testing rhy-1 mutants for sensitivity to exogenous cysteine?

      Response 14: To test the hypothesis that rhy-1(-) C. elegans might be sensitive to supplemental cysteine, we cultured wild type and rhy-1(-) animals on 0, 100, and 1000µM supplemental cysteine. At 0 and 100µM supplemental cysteine, neither wild-type nor rhy-1(-) animals display any lethality suggesting rhy-1 is not required for survival in the face of excess cysteine (Fig. 3D,E). We also cultured these same strains on 1000µM supplemental cysteine, a concentration that is highly toxic to wild-type animals (100% lethality). rhy1(-) animals were resistant to 1000µM supplemental cysteine with a substantial fraction of the population surviving overnight exposure to this lethal dose of cysteine. Similarly, egl-9(-) mutant C. elegans were also resistant to 1000µM supplemental cysteine. We propose that loss of egl-9 or rhy-1 activates HIF-1-mediated transcription which is priming these mutants to cope with the lethal dose of cysteine. These data are now presented in Figure 3D-F and presented in the Results section.

      Issue 15: The cysteine exposure assay was performed by incubating nematodes overnight in liquid M9 media containing OP50 culture. The liquid culture approach adds two complications: (1) the worms are arguably starving or at least undernourished compared to animals grown on NGM plates, and (2) the worms are probably mildly hypoxic in the liquid cultures, which complicates the interpretation.

      Response 15: We agree that it is possible that animals growing overnight in liquid culture are undernourished and mildly hypoxic. However, we are confident in our data interpretation as all our experiments are appropriately controlled. Meaning, control and experimental groups were all grown under the same liquid culture conditions. Thus, these animals would all experience the same stressors that come with liquid culture. Importantly, we never make comparisons between groups that were grown under different culture conditions (i.e. solid media vs. liquid culture).

      Issue 16: An easily addressable concern is the wording of one of the main conclusions: that cdo-1 transcription is independent of the canonical prolyl hydroxylase function of EGL-9 and is instead dependent on one of EGL-9's non-canonical, non-characterized functions. There are several points in which the wording suggests that CDO-1 toxicity is independent of EGL-9. In their defense, the authors try to avoid this by saying, "EGL-9 PHD," to indicate that it is the prolyl hydroxylase function of EGL-9 that is not required for CDO-1 toxicity. However, this becomes confusing because much of the field uses PHD and EGL-9/EGLN as interchangeable protein names. The authors need to be clear about when they are describing the prolyl hydroxylase activity of EGL-9 rather than other (hypothesized) activities of EGL-9 that are independent of the prolyl hydroxylase activity.

      Response 16: We appreciate the reviewer alerting us to this practice within the field. To avoid confusion, we have removed the “PHD” abbreviation from our manuscript and explicitly referred to the “prolyl hydroxylase domain” where relevant.

      Issue 17: The authors state in the text, "the egl-9; suox-1 double mutants are extremely sick and slow growing." We appreciate that their "health" assay, based on the exhaustion of food from the plate, is qualitative. We also appreciate that it is a functional measure of many factors that contribute to how fast a population of worms can grow, reproduce, and consume that lawn of food. However, unless they do a lifespan assay and/or measure developmental timing and specifically determine that the double mutant animals themselves are developing and/or growing more slowly, we do not think it is appropriate to use the words "slow growing" to describe the population. As they point out, the rate of consumption of food on the plate in their health assay is determined by a multitude and indeed a confluence of factors; the growth rate is one specific one that is commonly measured and has an established meaning.

      Response 17: We see how the phrase ‘slow growing’ might imply a phenotype that we have not actually assessed with this assay. Therefore, we have removed all claims about “slow growth” of the strains presented in Table 1 and have highlighted the assay more overtly in the results section. For example; “While egl-9(-) and suox-1(gk738847) single mutant animals are healthy under standard culture conditions, the egl-9(-); suox1(gk738847) double mutant animals are extremely sick and require significantly more days to exhaust their E. coli food source under standard culture conditions (Table 1).”

      Reviewer #1 (Recommendations For The Authors):

      Issue 18: Relevance could be addressed further in the text.

      Response 18: We have added additional context for our work in the Discussion section. Please see our response to Issues #5, 6, 12, and 24.

      Issue 19: Better appreciation and integration of the manuscript's findings with published studies would be appropriate.

      Response 19: We have added additional context for our work in the Discussion section. Please see our response to Issues #5, 6, 12, and 24.

      Issue 20: It might be perhaps relevant to test whether cdo-1 is relevant for hypoxia resistance since it appears to be a key target for hif-1.

      Response 20: We agree that this is an interesting future direction, however given that cdo-1 mRNA is not induced by hypoxia (Shen et al. 2005) we have not prioritized these experiments for the current manuscript.

      Issue 21: "egl-9 inhibits cdo-1 transcription in a prolyl-hydroxylase and VHL-1-independent manner" should be tempered. vhl-1 mutants and egl-9 hydroxylase point mutant still have significant induction of the reporter.

      Response 21: Thank you for identifying this oversight. We have modified the Figure 5 legend title to read, “egl9 inhibits cdo-1 transcription in a largely prolyl-hydroxylase and VHL-1-independent manner.”

      Issue 22: Please use line numbers in the future for easier tracking of comments.

      Response 22: We shall.

      Issue 23: Abstract and elsewhere, "high cysteine activates...", should be rephrased to "high levels of cysteine".

      Response 23: We have made this change throughout the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Issue 24: The authors discuss CDO1 in the context of tumorigenesis, as well as the potential regulation between cysteine and the hypoxia response pathway. Thus, I was surprised that there was no mention of the foundational Bill Kaelin paper (Briggs et al 2016) showing how the accumulation of cysteine is related to tumorigenesis, and that cysteine is a direct activator of EglN1. Puzzling that CDO1 is a tumor suppressor: you lose it, cysteine can accumulate and activate EglN1, causing HIF1 turnover. How do the authors reconcile their results with this paper? I was also surprised that there was no mention in the Discussion of the role of hydrogen sulfide, cysteine metabolism, and CTH and CBS in oxygen sensation in the carotid body given the role they play there. Seems important to discuss this issue.

      Response 24: We have added new sections to our Discussion that consider the relationship between our work and Briggs et al. 2016 as well as mentioned the role of CTH and H2S in the mammalian carotid body.

      Issue 25: The abstract has a variety of contradictory statements. For example, the authors state that "HIF-1mediated induction of cdo-1 functions largely independent of EGL-9," but then go on to conclude in the final sentence that cysteine stimulates H2S production, which then activates EGL-9 signaling, which then increases HIF-1-mediated transcription of cdo-1. A quick reading of the abstract leaves the reader uncertain whether EGL-9 is or is not involved in this regulation of cdo-1 expression. In addition, the conclusion sentence implies that activation of the EGL-9 pathway increases HIF-1-mediated transcription, yet it is well established that EGL-9 is an inhibitor of HIF-1. The abstract fails to deliver a clear summary of the paper's conclusions. Perhaps consider this alternative (changes in capital letters):

      The amino acid cysteine is critical for many aspects of life, yet excess cysteine is toxic. Therefore, animals require pathways to maintain cysteine homeostasis. In mammals, high cysteine activates cysteine dioxygenase, a key enzyme in cysteine catabolism. The mechanism by which cysteine dioxygenase is regulated remains largely unknown. We discovered that C. elegans cysteine dioxygenase (cdo-1) is transcriptionally activated by high cysteine and the hypoxia inducible transcription factor (hif-1). hif-1- dependent activation of cdo-1 occurs downstream of an H2S-sensing pathway that includes rhy-1, cysl-1, and egl-9. cdo-1 transcription is primarily activated in the hypodermis where it is sufficient to drive sulfur amino acid metabolism. EGL-9 and HIF-1 are core members of the cellular hypoxia response. However, we demonstrate that the mechanism of HIF-1-mediated induction of cdo-1 IS largely independent of EGL-9 prolyl hydroxylASE ACTIVITY and the von Hippel-Lindau E3 ubiquitin ligase. We propose that the REGULATION OF cdo-1 BY HIF-1 reveals a negative feedback loop for maintaining cysteine homeostasis. High cysteine stimulates the production of an H2S signal. H2S then ACTS THROUGH the rhy-1/cysl-1/egl-9 signaling pathway DISTINCTLY FROM THEIR ROLE IN HYPOXIA RESPONSE TO INCREASE HIF-1-mediated transcription of cdo-1, promoting degradation of cysteine via CDO-1.

      Response 25: We agree that the abstract could be clearer. We believe this concern stems from the fact that we did not discuss our initial screen in the abstract. Thus, we failed to establish a role for egl-9 in the regulation of cdo-1. To remedy this, we have modified the abstract as suggested by the reviewer and added additional context. We believe that these changes improve the clarity of the Abstract substantially.

      Issue 26: An easily addressable concern involves the "dark" microscopy controls showing lack of fluorescence from a nematode. In these dark negative control micrographs, the authors should draw dotted outlines around where the worms are or include a brightfield image next to the fluorescence image. On a computer screen, it is in fact possible to make out the worms. Yet, when printed out, the reader must assume there are worms in the dark images. Additionally, we realize that adjusting fluorescence so that wild-type CDO-1 expression can be seen will result in oversaturation of the egl-9 and rhy-1; cdo-1 doubles; however, this would be a useful figure to add into the supplement to both provide a normal reference of CDO-1 low-level expression and a demonstration of just how bright it is in the mutant backgrounds. It would also be useful for you to please report your exposure settings for purposes of reproducibility.

      Response 26: As suggested, we have added dotted lines around the location of the C. elegans animals in all images where GFP expression is low or basal. We have also reported the exposure times for each image in the appropriate figure legends.

      Issue 27: This title is quite generic and doesn't even mention the main players (CDO-1 and sulfite metabolism).

      Response 27: We have updated our title to call attention to cysteine dioxygenase. The improved title is: “Hypoxia-inducible factor induces cysteine dioxygenase and promotes cysteine homeostasis in Caenorhabditis elegans”

      Issue 28: The authors mention two disorders in which CDO-1 plays a pathogenic role: MoCD and ISOD. We recommend switching the order in which the authors mention these, as the remainder of the paragraph is about MoCD. Also, they should write out the number "2" in the first sentence of that paragraph.

      Response 28: We have made the suggested changes.

      Issue 29: The authors state in the main text, "...to ubiquitinate HIF-1, targeting it for degradation by the proteosome." Here, they should refer to the pathway in Figure 5a.

      Response 29: We have made the suggested change.

      Issue 30: The authors state in the main text, "Elements of the HIF-1 pathway have emerged..." which is vague and confusingly worded. Change to, "Members of the HIF-1 pathway and its targets have emerged from C. elegans genetic studies."

      Response 30: We have made the suggested change.

      Issue 31: Clarify in the figure legends that supplemental cysteine did not affect the mortality of worms that were imaged.

      Response 31: We have added this note to Figure 3A and Figure S3A.

      Issue 32: Figure 1b. "the cdo-1 promoter is shown..." Add: "as a straight line" to the end of this phrase.

      Response 32: We have made the suggested change.

      Issue 33: The authors should consider changing the red text in Figure 1 to magenta, which tends to be more readable for people who have limited color vision.

      Response 33: We have adjusted the colors in Figure 1 as suggested.

      Issue 34: Figure 2, legend title. Consider changing "hif-1" to "HIF-1," as well as rhy-1, cysl-1, and egl-9. In this case, they are talking about proteins, not mutants or genes. This will make the paper easier to follow for readers who lack a C. elegans background.

      Response 34: We have made the suggested change.

      Issue 35: Figure 5, caption text. "...indicates weak similarity." Add, "amongst species compared."

      Response 35: We have made the suggested change.

      Issue 36: It is starting to become a standard for showing the datapoints in bar graphs. Although this is done in many graphs in the paper, it should also be done for Figure S1 and Figure 4C.

      Response 36: We have made the suggested change.

      Issue 37: An extensive ChIP-seq and RNA-seq analysis of C. elegans HIF-1 was recently published (Vora et al, 2022), which the authors should reference in support of the regulation of CDO-1 transcription by HIF-1 in their description of published expression studies of the pathway (Results section, page 4). Indeed, Vora et al were key generators of the ChIP-seq data cited in Warnhoff et al but not included as authors in the ModERN/ModENCODE publication: their contributions were published separately in Vora et al and should be acknowledged equivalently.

      Response 37: We appreciate the reviewer pointing this detail out and we have added the correct citation as indicated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Some suggestions:

      1) It's obviously concerning that your GWAS results are not at all robust to the approach used (Fig S3). Did you try something non-parametric, like a Kruskal-Wallis test?

      We used both GWAS and crosses (F2) to validate the presence of the QTL. So ,evidence is not only brought by GWAS. We did not use non parametric tests as we will have difficulty to account for population structure/relatedness with such approaches. Our GWAS approach is certainly a little underpowered associated with the number of individuals we used and certainly the polygenic nature of the root growth traits. But F2 crosses allow us to put more evidence weight on some region we identified with GWAS.

      2) You don't explain what you do with heterozygotes, nor discuss the level of inbreeding in general.

      We are dealing with inbred lines, but indeed there are not completely fixed inbred lines. For the remaining heterozygotes, they were randomly fixed in one or the other alleles. The median heterozygosity value was low at 5.6%. We clarified this point in the material and methods.

      3) The finding that over 30% of RNA-seq reads don't seem to have an annotated home should give you pause. Do they map anywhere? At least discuss what is going on. Also, note that you likely have enormous errors in SNP-calling due to cryptic structural variation - think about what this might do?

      We agree with reviewer #1. We added a few sentences in the result section to clarify this point: “When further analyzed, 15.15% of the unmapped reads (with no correspondence to predicted CDS) were found not to match the reference genome. These might correspond either to unsequenced regions or to genotype-specific genomic regions that are not present in the reference line. The remaining unmapped reads corresponded to either rRNA and tRNA genes (40.28% of the unmapped reads) or to non-annotated genes or non-coding RNAs (44.57% of the unmapped reads).” As we used the same reference genome for mapping the RNAseq reads, some genes might not being present in our analysis for the two lines we studied.

      4) Did you consider moving PgGRXC9 into Arabidopsis?

      This is a great suggestion. In fact, we plan to explore more how some GRXs regulate root growth and how this is conserved in plants in a follow up project. This is however beyond the scope of this manuscript.

      Minor suggestions:

      1) Why not calculate H^2 simply as line variance divided by total?

      Heritability estimated on single individuals in population, approaches generally used for human and animal breeding led directly to line variance divided by total phenotypic variance.

      But in plant breeding (or plant science), we generally work on replicated genotypes in different blocks/experimental repetition. So we estimate the heritability of the mean phenotype of genotypes. There is ample literature (Nyquist, 1991; Holland et al. 2003; for a very nice and smartly written explanation, on the introduction of this PhD: http://opus.uni-hohenheim.de/volltexte/2020/1720/pdf/20200221_PhD_Thesis_Publikationsversion.pdf). Calculation of heritability (of the mean phenotype) should take into account for the calculation of the phenotypic variance (denominator) the number of replicate genotypes (we do not have a single plant, but several clones when using inbred lines: n). The meaning of the formula is that the error in the model is inflated because we have n replicate plants per genotype. And so to estimate the heritability of the average genotype, we have to take into account this inflated variance in the errors.

      2) While the paper overall is well-written, the captions need further proof-reading.

      We corrected all the captions.

      Reviewer #2 (Recommendations For The Authors):

      Major suggestions:

      1) The experimental support for the mutant phenotype of roxy19 needs to be further substantiated. Current methods available for CRISPR mutagenesis make it relatively easy to generate additional alleles. Alternatively, the authors could complement the mutant with a wild-type copy of the gene. These approaches represent the standard of the field and should be used here as well.

      We agree with rev #2. We added some sentences in the discussion to stress out the limitations of our study to link the QTL to PgGRXC9.

      As stated above we’d like to explore more how some GRXs regulate root growth and how this is conserved in plants. We plan to generate new single and multiple mutants in ROXY19 and its closest homologues (using CRISPR). This is, however, beyond this manuscript.

      2) The authors may want to state more clearly what the hypothesis is for how redox levels might contribute to root length differences and more clearly state what the limits of their current study are.

      We modified the discussion to try to clearly indicate the limitations of our study.

      3) Differences in root growth can be the consequence of a number of different parameters that contribute to root elongation and the authors need to more clearly define which of these are likely affected in their different genotypes.

      We agree with Reviewer #2. However, as stated before, we plan to further explore the molecular and cellular mechanisms responsible for the phenotype we observe in Arabidopsis. This will need extra work and is beyond the scope of this manuscript.

      4) Page 13, first paragraph. The authors provide an overly strong statement that suggests they have determined the molecular basis for the difference in PgGRXC9: " Altogether, our results suggest that PgGRXC9 is a positive regulator of root growth and that a polymorphism in the promoter region of PgGRXC9 associated with changes in its expression level appeared responsible for a quantitative difference in root growth between the two lines."

      While their results suggest the PgGRXC9 locus is associated with root growth variation, they have not directly tested the effect of the polymorphisms in the promoter on gene expression and this statement needs to be weakened.

      We changed the text to: “Altogether, our results suggest that PgGRXC9 is a positive regulator of root growth and that a polymorphism in the promoter region of PgGRXC9 might led to changes in its expression level and ultimately to a quantitative difference in root growth between the two lines. However, the effect of the polymorphisms in the promoter on gene expression need to be tested to validate this hypothesis.”

      We also changed the title of the manuscript to better reflect our results.

      Minor suggestions:

      1) Page 4: "FTSW below 0.3 was considered a stressful condition." It was not specified how this threshold was determined.

      This value corresponds to the measured FTSW value at which pearl millet genotypes subjected to a dry down generally start to reduce their transpiration rate (see Fig. 1 of Kholová et al, 2010; https://doi.org/10.1093/jxb/erp314). At FTSW values above 0.3, transpiration is not affected. At FTSW values around 0.3, the water supply from pearl millet roots cannot fully support transpiration. The plant enters a drought stress responsive phase and progressively closes its stomata to reduce water losses and decrease plant productive functions to match water supply. We have clarified this in the manuscript.

      2) Page 6: Figure 1; footnote: at the end of the description of panel A, a comma is missing between "red" and "blue."

      Thanks for pointing that out. This was corrected.

      3) The root growth data determined by X-ray imaging is not significant (Fig S4B), yet the authors describe the result in the main text without qualification. The authors should clarify this in the text.

      We added some text to clarify this.

      4) Page 9: Figure 2C; It would be better to enlarge these images and annotate them to indicate what specific anatomical features have been measured. Currently, only an expert in the field would be able to interpret these images.

      While we understand the point made by Reviewer #2, Fig2C was meant to illustrate differences in the root tip of the two lines.

      5) Page 9: Figures 2D and E; the number of biological samples measured is not indicated (what is "n"?).

      Thanks again for pointing this out. This was added to the figure legend.

      6) Page 14: Figure 4B; scale bar needs to be included.

      Scale bars were added to the pictures.

      7) Page 14: Figure 4; I recommend adding confocal images or DIC of cleared root apex tissues to easily compare the RAM size and cell lengths in both WT and roxy19 mutant.

      Once again, we plan to have a follow up study on the molecular and cellular mechanisms of action of ROXY19 and its closest homologues on root development. We believe a thorough analysis of differences in phenotype could be illustrated in a future manuscript.

      8) Page 18: main text; "we propose that redox regulation in the root meristem is responsible for a root growth QTL in pearl millet." This statement is ambiguous in the description of the mechanism. The authors do not clarify if the role they propose for PgGRXC9 is in the meristematic or elongation zone. Likely the authors are not able to know precisely where the gene is acting at this point, and so the presented hypothesis needs to more clearly state what limitations there are in assigning a mode of action for the PgGRXC9 and ROXY19 genes in root growth.

      We rewrote this paragraph to clarify the current gap in our understanding of the putative PgGRXC9 function.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study is an important advancement to the consideration of antimalarial drug resistance: the authors make use of both modelling results and supporting empirical evidence to demonstrate the role of malaria strain diversity in explaining biogeographic patterns of drug resistance. The theoretical methods and the corresponding results are convincing, with the novel model presented moving beyond existing models to incorporate malaria strain diversity and antigen-specific immunity. This work is likely to be interesting to malaria researchers and others working with antigenically diverse infectious diseases.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper is an attempt to explain a geographic paradox between infection prevalence and antimalarial resistance emergence. The authors developed a compartmental model that importantly contains antigenic strain diversity and in turn antigen-specific immunity. They find a negative correlation between parasite prevalence and the frequency of resistance emergence and validate this result using empirical data on chloroquine-resistance. Overall, the authors conclude that strain diversity is a key player in explaining observed patterns of resistance evolution across different geographic regions.

      The authors pose and address the following specific questions:

      1. Does strain diversity modulate the equilibrium resistance frequency given different transmission intensities?

      2. Does strain diversity modulate the equilibrium resistance frequency and its changes following drug withdrawal?

      3. Does the model explain biogeographic patterns of drug resistance evolution?

      Strengths:

      The model built by the authors is novel. As emphasized in the manuscript, many factors (e.g., drug usage, vectorial capacity, population immunity) have been explored in models attempting to explain resistance emergence, but strain diversity (and strain-specific immunity) has not been explicitly included and thus explored. This is an interesting oversight in previous models, given the vast antigenic diversity of Plasmodium falciparum (the most common human malaria parasite) and its potential to "drive key differences in epidemiological features".

      The model also accounts for multiple infections, which is a key feature of malarial infections, with individuals often infected with either multiple Plasmodium species or multiple strains of the same species. Accounting for multiple infections is critical when considering resistance emergence, as with multiple infections there is within-host competition which will mediate the fitness of resistant genotypes. Overall, the model is an interesting combination of a classic epidemiological model (e.g., SIR) and a population genetics model.

      In terms of major model innovations, the model also directly links selection pressure via drug administration with local transmission dynamics. This is accomplished by the interaction between strain-specific immunity, generalized immunity, and host immune response.

      R: We thank the reviewer for his/her appreciation of the work.

      Weaknesses:

      In several places, the explanation of the results (i.e., why are we seeing this result?) is underdeveloped. For example, under the section "Response to drug policy change", it is stated that (according to the model) low diversity scenarios show the least decline in resistant genotype frequency after drug withdrawal; however, this result emerges mechanistically. Without an explicit connection to the workings of the model, it can be difficult to gauge whether the result(s) seen are specific to the model itself or likely to be more generalizable.

      R: We acknowledge that the explanation of certain results needs to be improved. We have now added the explanation of why low diversity scenarios show the least decline in resistance frequency after drug withdrawal: “Two processes are responsible for the observed trend: first, resistant genotypes have a much higher fitness advantage in low diversity regions even with reduced drug usage because infected hosts are still highly symptomatic; second, due to low transmission potential in low diversity scenarios (i.e., longer generation intervals between transmissions), the rate of change in parasite populations is slower.” (L243-247). We also compared the drug withdrawal response to that of the generalized-immunity-only model (L268-271). The medium transmission region has the fastest reduction in resistance frequency, followed by the high and low transmission regions, which differs from the full model that incorporates strain-specific diversity.

      In addition, to provide the context of different biogeographic transmission zones, we now include a new figure (now Fig. 3) that presents the parameter space of transmission potential and strain diversity of different continents, which demonstrates that PNG and South America have less strain diversity than expected by transmission potential (L179-184 and L198-202). Therefore, these two regions have low disease prevalence and high resistance frequency.

      The authors emphasize several model limitations, including the specification of resistance by a single locus (thus not addressing the importance of recombination should resistance be specified by more than one locus); the assumption that parasites are independently and randomly distributed among hosts (contrary to empirical evidence); and the assumption of a random association between the resistant genotype and antigenic diversity. However, each of these limitations is addressed in the discussion.

      R: As pointed out by the referee, our model presents several limitations that have all been addressed in the discussion and considered for future extensions.

      Did the authors achieve their goals? Did the results support their conclusion?

      Returning to the questions posed by the authors:

      1. Does strain diversity modulate the equilibrium resistance frequency given different transmission intensities? Yes. The authors demonstrate a negative relationship between prevalence/strain diversity and resistance frequency (Figure 2).

      2. Does strain diversity modulate the equilibrium resistance frequency and its changes following drug withdrawal? Yes. The authors find that, under resistance invasion and some level of drug treatment, resistance frequency decreased with the number of strains (Figure 4). The authors also find that lower strain diversity results in a slower decline in resistant genotypes after drug withdrawal and higher equilibrium resistance frequency (Figure 6).

      3. Does the model explain biogeographic patterns of drug resistance evolution? Yes. The authors find that their full model (which includes strain-specific immunity) produces the empirically observed negative relationship between resistance and prevalence/strain diversity, while a model only incorporating generalised immunity does not (Figure 8).

      Utility of work to others and relevance within and beyond the field?

      This work is important because antimalarial drug resistance has been an ongoing issue of concern for much of the 20th century and now 21st century. Further, this resistance emergence is not equitably distributed across biogeographic regions, with South America and Southeast Asia experiencing much of the burden of this resistance emergence. Not only can widespread resistant strains be traced back to these two relatively low-transmission regions, but these strains remain at high frequency even after drug treatment ceases.

      Reviewer #2 (Public Review):

      Summary:

      The evolution of resistance to antimalarial drugs follows a seemingly counterintuitive pattern, in which resistant strains typically originate in regions where malaria prevalence is relatively low. Previous investigations have suggested that frequent exposures in high-prevalence regions produce high levels of partial immunity in the host population, leading to subclinical infections that go untreated. These subclinical infections serve as refuges for sensitive strains, maintaining them in the population. Prior investigations have supported this hypothesis; however, many of them excluded important dynamics, and the results cannot be generalized. The authors have taken a novel approach using a deterministic model that includes both general and adaptive immunity. They find that high levels of population immunity produce refuges, maintaining the sensitive strains and allowing them to outcompete resistant strains. While general population immunity contributed, adaptive immunity is key to reproducing empirical patterns. These results are robust across a range of fitness costs, treatment rates, and resistance efficacies. They demonstrate that future investigations cannot overlook adaptive immunity and antigenic diversity.

      R: We thank the reviewer for his/her appreciation of the work.

      Strengths:

      Overall, this is a very nice paper that makes a significant contribution to the field. It is well-framed within the body of literature and achieves its goal of providing a generalizable, unifying explanation for otherwise disparate investigations. As such, this work will likely serve as a foundation for future investigations. The approach is elegant and rigorous, with results that are supported across a broad range of parameters.

      Weaknesses:

      Although the title states that the authors describe resistance invasion, they do not support or even explore this claim. As they state in the discussion (line 351), this work predicts the equilibrium state and doesn't address temporal patterns. While refuges in partially immune hosts may maintain resistance in a population, they do not account for the patterns of resistance spread, such as the rapid spread of chloroquine resistance in Africa once it was introduced from Asia.

      R: We do agree that resistance invasion is not the focus of our manuscript. Rather we mainly investigate the maintenance and decline after drug withdrawal. Therefore, we changed the title to “Antigenic strain diversity predicts different biogeographic patterns of maintenance and decline of anti-malarial drug resistance” (L1-4).

      We did, however, present a fast initial invasion phase for the introduction of resistant genotypes regardless of transmission scenarios in Fig. 5 (now Fig. 6). Even though the focus of the manuscript is to investigate long term persistence of resistant genotypes, we did emphasize that the initial invasion phase and how that changes the host immunity profile are key to the coexistence of resistant and wild-type genotypes (L228-239).

      As the authors state in the discussion, the evolution of compensatory mutations that negate the cost of resistance is possible, and in vitro experiments have found evidence of such. It appears that their results are dependent on there being a cost, but the lower range of the cost parameter space was not explored.

      R: It is true that compensatory mutations might mitigate the negative fitness consequences. We didn’t add a no-cost scenario because in general if there is no cost but only benefit (survival through drug usage), then resistant haplotypes will likely be fixed in the population. This is contingent on the assumption that these compensatory mutations are in perfect linkage with resistant alleles, which is unlikely in high-transmission scenarios. Our model does not incorporate recombination, but earlier models (Dye & Williams 1997, Hastings & D’Alessandro 2000) have demonstrated that recombination will delay the fixation of resistant alleles in high-transmission.

      As suggested, we ran our model with costs equal 0 and 0.01 (Fig. 2C and L189-191). We found that resistant alleles almost always fix except for when diversity is extremely high, treatment/resistance efficacy is low. In these cases, additional benefits brought by more transmission from resistant alleles do not bring many benefits (as lower GI classes have a very small number of hosts). This finding does not contradict a wider range of coexistence between wild-type and resistant alleles when the cost is higher. We therefore added these scenarios to our updated results.

      Author response image 1.

      The use of a deterministic, compartmental model may be a structural weakness. This means that selection alone guides the fixation of new mutations on a semi-homogenous adaptive landscape. In reality, there are two severe bottlenecks in the transmission cycle of Plasmodium spp., introducing a substantial force of stochasticity via genetic drift. The well-mixed nature of this type of model is also likely to have affected the results. In reality, within-host selection is highly heterogeneous, strains are not found with equal frequency either in the population or within hosts, and there will be some linkage between the strain and a resistance mutation, at least at first. Of course, there is no recourse for that at this stage, but it is something that should be considered in future investigations.

      R: We thank the reviewer for their insightful comments on the constraints of the deterministic modeling approach. We’ve added these points to discussion in the paragraph discussing the second limitation of the model (L359-364).

      The authors mention the observation that patterns of resistance in high-prevalence Papua New Guinea seem to be more similar to Southeast Asia, perhaps because of the low strain diversity in Papua New Guinea. However, they do not investigate that parameter space here. If they did and were able to replicate that observation, not only would that strengthen this work, it could profoundly shape research to come.

      R: We appreciate the suggestion to investigate the parameter space of Papua New Guinea. We now include a new figure (now Fig. 3) that presents the parameter space of transmission potential and strain diversity of different continents, which demonstrates that PNG and South America have less strain diversity than expected by transmission potential (L179-184 and L198-202). This translates to low infectivity for most mosquito bites, and most infections only occur in hosts with lower generalized immunity. Therefore resistant genotypes will help ensure disease transmission in these symptomatic hosts and be strongly selected to be maintained.

      Reviewer #1 (Recommendations For The Authors):

      1. I found lines 41-49 difficult to follow. Please rephrase (particularly punctuation) for clarity.

      R: We have edited the lines to improve the writing (L41-50)):

      “Various relationships between transmission intensity and stable frequencies of resistance were discovered, each of which has some empirical support: 1) transmission intensity does not influence the fate of resistant genotypes [Models: Koella and Antia (2003); Masserey et al. (2022); Empirical: Diallo et al. (2007); Shah et al. (2011, 2015)]; 2) resistance first increases in frequency and slowly decreases with increasing transmission rates [Models: Klein et al. (2008, 2012)]; and 3) Valley phenomenon: resistance can be fixed at both high and low end of transmission intensity [Model: Artzy-Randrup et al. (2010); Empirical: Talisuna et al. (2002)]. Other stochastic models predict that it is harder for resistance to spread in high transmission regions, but patterns are not systematically inspected across the parameter ranges [Model: Whitlock et al. (2021); Model and examples in Ariey and Robert (2003)].”

      1. Line 65: There should be a space after "recombination" and before the citation.

      R: Thank you for catching the error. We’ve added the space (L64).

      1. I'm interested in the dependency of the results on the assumption that there is a cost to resistance via lowered transmissibility (lines 142-145). I appreciate that variation in the cost(s) of resistance in single and mixed infections is explored; however, from what I can tell the case of zero cost is not explored.

      R: As suggested, we have now added the no-cost scenario. Please see the response to the Reviewer2 weaknesses paragraph 2.

      1. I felt the commentary/explanation of the response to drug policy change was a bit underdeveloped. I would have liked a walk-through of why in your model low diversity scenarios show the slowest decline in resistant genotypes after switching to different drugs.

      R: We acknowledge that the explanation of the response to drug policy change needs to be improved. We have now added the explanation of why we observe low diversity scenarios show the least decline in resistance frequency after drug withdrawal: “Two processes are responsible for the seen trend: first, resistant genotypes have a much higher fitness advantage in low diversity regions even with reduced drug usage because infected hosts are still highly symptomatic; second, due to low transmission potential in low diversity scenarios (i.e., longer generation intervals between transmissions), the rate of change in parasite populations is slower.” (L243-247). We also compared the drug withdrawal response to that of the generalized-immunity-only model. The medium transmission region has the fastest reduction in resistance frequency, followed by the high and low transmission regions, which differs from the full model that incorporates strain-specific diversity.

      1. Line 352: persistent drug usage?

      R: Yes, we meant persistent drug usage. We’ve clarified the writing (L389-391).

      1. The organisation of the manuscript would benefit from structuring around the focal questions so that the reader can easily find the answers to the focal questions within the results and discussion sections.

      R: This is a great suggestion. We modified the subheadings of results to provide answers to focal questions (L151, L179, L203-204, and L240).

      1. Line 353: Please remove either "shown" or "demonstrated".

      R: Thank you for catching the grammatical error, we’ve retained “shown” only for the sentence (L391-392).

      Reviewer #2 (Recommendations For The Authors):

      Overall, this was very nice work and a pleasure to read.

      Major:

      1. Please provide a much more thorough explanation of how resistance invasions are modeled. It is not clear from the text and could not be replicated.

      R: We have now added a section “drug treatment and resistance invasion” in Methods and Materials to explain how resistance invasions are modeled (L488-496):

      “Given each parameter set, we ran the ODE model six times until equilibrium with the following genotypic compositions: 1) wild-type only scenario with no drug treatment; 2) wild-type only scenario with 63.2% drug treatment (0.05 daily treatment rate); 3) wild-type only scenario with 98.2% drug treatment (0.2 daily treatment rate); 4) resistant-only scenario with no drug treatment; 5) resistance invasion with 63.2% drug treatment; 6) resistance invasion with 98.2% drug treatment. Runs 1-4 start with all hosts in G0,U compartment and ten parasites. Runs 5 and 6 (resistance invasion) start from the equilibrium state of 2 and 3, with ten resistant parasites introduced. We then followed the ODE dynamics till the next equilibrium.”

      1. Please make your raw data, code, and replicable examples that produce the figures in the manuscript available.

      R: We have added the data availability session, which provides the GitHub site with all the code for the model, data processing, and figures: All the ODE codes, numerically-simulated data, empirical data, and analyzing scripts are publicly available at https://github.itap.purdue.edu/HeLab/MalariaResistance.

      1. Regarding the limitations described in the paragraph about the model in the public response, these results would be strengthened if there were separate compartments for strains which could be further divided into sensitive and resistant. Could you explore this for at least a subset of the parameter space?

      R: In our model, sensitive and resistant pathogens are always modeled as separate compartments (Fig. S1B and Appendix 1). In Results/Model structure, L135-136, we stated the setup:

      “The population sizes of resistant (PR) or sensitive (wild-type; PW) parasites are tracked separately in host compartments of different G and drug status.”

      1. To what extent do these results rely on a cost to resistance? Were lower costs explored? This would be worth demonstrating. If this cannot be maintained without cost, do you think this is because there is no linkage between strain and resistance?

      R: As suggested, we have now added the no-cost scenario (Fig. 2C and L189-191). Please see the response to the Reviewer1 weaknesses paragraph 2. In sum, under a no-cost scenario, if treatment rate is low, then wild-type alleles will still be maintained in high transmission scenarios; when treatment rate is high, resistant alleles will always be fixed.

      Minor:

      1. "Plasmodium" should be italicized throughout. Ironically, italics aren't permitted in this form.

      R: We did italicize “Plasmodium” or “P. falciparum” throughout the text. If the reviewer is referring to “falciparum malaria”, the convention is not to italicize falciparum in this case.

      1. Fig 1A: the image is reversed for the non-infected host with prior exposure to strain A. Additionally, the difference between colors for WT and resistant is not visible in monochrome.

      R: Thank you for pointing out the problem of color choice in monochrome. We have modified the figure. The image in Fig 1A is not reversed for non-infected hosts with prior exposure to strain A. We now spell out “S” to be “specific immunity”, and explain it better in the figure legend.

      1. Fig 2B: add "compare to the pattern of prevalence shown in Fig 2A" or something similar to make the comparison immediately clear.

      R: We thank the reviewer’s suggestion. We’ve added a sentence to contrast Fig 2A and B in the Figure legend: “A comparison between the prevalence pattern in (A) and resistance frequency in (B) reveals that high prevalence regions usually correspond to low resistance frequency at the end of resistance invasion dynamics.”

      1. Figs 2B & C: Please thoroughly explain how you produced this data in the methods section and briefly describe it in the results sections.

      R: We agree that the modeling strategies need to be explained better. Since we explained the rationale for the parameter ranges and the prevalence patterns we observe in the results section “Appropriate pairing of strain diversity and vectorial capacity” (now “Impact of strain diversity and transmission potential on disease prevalence”), we added sentences in this section to explain how we run models until equilibrium for wild-only infections with or without drug treatment (L152-178). Then in the following section “Drug-resistance and disease prevalence” section, we explain how we obtained the resistance invasion data:

      “To investigate resistance invasion, we introduce ten resistant infections to the equilibrium states of drug treatment with wild-type only infections, and follow the ODE dynamics till the next equilibrium” (L180-181).

      1. Fig 3: The axis labels are not particularly clear. For the Y axis, please state in the label what it is the frequency of (either the mutation or the phenotype). In the X axis, it is better to spell that out in words, like "P. falciparum prevalence in children".

      R: Thank you for pointing this out. We’ve modified the axes labels of Fig. 3 (now Fig. 4): X-axis: “P. falciparum prevalence in children aged 2-10”; Y-axis: “Frequency of resistant genotypes (pfcrt 76T)”.

      1. Fig 4 and the rest of the figures of this nature: Showing an equilibrium-state timestep before treatment was introduced would improve the readers' understanding of the dynamics.

      R: We agree that the equilibrium state before treatment is important. In fact, we have those states in our figure 4 (now figure 5): the left panel- “Daily treatment rate 0” indicates the equilibrium-state timestep before treatment. We clarified this point in the caption.

      1. Fig 5 is very compelling, but the relationships in Fig 5 would be clearer if the Y axes were not all different. Consider using the same scale for the hosts, and the same scale for resistant parasites (both conditions) and WT parasites, 113 strains. It may be clearer to reference them if they are given as A-F instead of three figures each for A and B.

      R: We agree with the suggested changes and have modified figure 5 (now Fig. 6): we used one Y-axis scale for the hosts, and one Y-axis scale for the parasites. The wild-type one is very low for the low diversity scenario, thus we included one inset plot for that case.

      1. Fig 5 caption: High immune protection doesn't select against resistance. The higher relative fitness of the sensitive strain selects against resistance in a high-immunity environment.

      R: Thank you for pointing this out. Here we meant that a reduction in resistant population after the initial overshoot occurs in both diversity levels. We are not comparing resistant strains to sensitive ones. We’ve modified the sentence to: “The higher specific immunity reduces the infectivity of new strains, leading to a reduction of the resistant parasite population regardless of the diversity level”.

      1. Line 242: "keep" should be plural.

      R: We’ve corrected “keep” to “keeps” (L267).

      1. Line 360 and elsewhere: The strength of the results is somewhat overstated at times. This absolutely supports the importance of strain-specific immunity, but these results do not explain patterns of the origin of resistance and there are a number of factors that are not incorporated (a necessary evil of modeling to be sure).

      R: Thank you for pointing this out. We’ve modified discussion to remove the overstated strength of results:

      1) Original: “The inclusion of strain diversity in the model provides a new mechanistic explanation as to why Southeast Asia has been the original source of resistance to certain antimalarial drugs, including chloroquine.”

      Modified: “The inclusion of strain diversity in the model provides a new mechanistic explanation as to why Southeast Asia has persisting resistance to certain antimalarial drugs, including chloroquine, despite a lower transmission intensity than Africa. “ (L328-330)

      2) In sum, we show that strain diversity and associated strain-specific host immunity, dynamically tracked through the macroparasitic structure, can explainpredict the complex relationship between transmission intensity and drug-resistance frequencies.

      1. The color palettes are not discernible in grayscale, especially the orange/blue/gray in Fig 2. The heatmaps appear to be in turbo, the only viridis palette that isn't grayscale-friendly. Just something to keep in mind for the accessibility of individuals with achromatopsia and most people who print out papers.

      R: Thank you for the visualization suggestions. We updated all the figures with the “viridis:magma” palette. As for the orange/blue/gray scale used in Fig 2C, it is difficult to pick nine colors that are discernable in brightness in grayscale. Currently, the four colors correspond to clonal genotype cost (i.e. green, red, grey, and blue), and the three-level brightness maps to mixed genotype cost.

    1. Author Response

      eLife assessment

      This study presents a valuable method to visualize the location of the cell types discovered through single-cell RNA sequencing. The evidence supporting the claims is solid, but the inclusion of a larger number of samples would strengthen the study. It would also be helpful to have the methods explained in more detail. The work will be of interest to those seeking to identify new cell types from scRNA-seq and snRNA-seq data.

      Response: We are surprised about the editor’s assessment of our paper as a “valuable” method. This is the first Drosophila adult spatial transcriptomics paper. Hence, we would at least consider this being an “important” method. Spatial transcriptomics has thus far only been done in embryos, which are easy to process for FISH for many decades. Integration with single-cell data is also new. We are further surprised that this assessment does not mention the identification of subcellular mRNA patterns in adult muscles as an “important” biological finding of this paper. We are not aware that any localized mRNAs in Drosophila muscles were known prior to our study. This shows the advantage of spatial transcriptomics over single-cell techniques.

      The work indeed does not represent a full spatial fly adult atlas – however, a proof of principle study covering both the head and body that we consider at least “important”.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Janssens et al. addressed the challenge of mapping the location of transcriptionally unique cell types identified by single nuclei sequencing (snRNA-seq) data available through the Fly Cell Atlas. They identified 100 transcripts for head samples and 50 transcripts for fly body samples allowing the identification of every unique cell type discovered through the Fly Cell Atlas. To map all of these cell types, the authors divided the fly body into head and body samples and used the Molecular Cartography (Resolve Biosciences) method to visualize these transcripts. This approach allowed them to build spatial tissue atlases of the fly head and body, to identify the location of previously unknown cell types and the subcellular localization of different transcripts. By combining snRNA-seq data from the Fly Cell Atlas with their spatially resolved transcriptomics (SRT) data, they demonstrated an automated cell type annotation strategy to identify unknown clusters and infer their location in the fly body. This manuscript constitutes a proof-of-principle study to map the location of the cells identified by ever-growing single-cell transcriptomic datasets generated by others.

      Strengths:

      The authors used the Molecular Cartography (Resolve Biosciences) method to visualize 100 transcripts for head samples and 50 transcripts for fly body samples in high resolution. This method achieves high resolution by multiplexing a large number of transcript visualization steps and allows the authors to map the location of unique cell types identified by the Fly Cell Atlas.

      Response: We thank the reviewer for their comment, but are surprised that this assessment does not mention the identification of subcellular mRNA patterns in adult muscles as an important biological finding of this paper. This might be due to the visualization problem that this reviewer was facing with a greyscale version of the PDF as mentioned in the comments below. We do not know what caused the technical problem for this reviewer (the PDF figures are in color on the eLife website and on bioRxiv). We are surprised that the eLife discussion session did not resolve this issue.

      Weaknesses:

      Combining single-nuclei sequencing (snRNA-seq) data with spatially resolved transcriptomics (SRT) data is challenging, and the methods used by the authors in this study cannot reliably distinguish between cells, especially in brain regions where the processes of different neurons are clustered, such as in neuropils. This means that a grid that the authors mark as a unique cell may actually be composed of processes from multiple cells.

      Response: The size of the fly is one of the most challenging aspects of performing spatial transcriptomics. The small size of the samples led to detachment from the slides, which we solved by coating the slides with gelatin. While the resolution of Molecular Cartography is high (<200nm), in the brain challenges remain as noted by the reviewer. Drosophila neuronal nuclei are notoriously small and cannot be easily resolved with current techniques. We agree that for a full atlas either expansion microscopy, 3D techniques or even higher resolution will be required.

      Reviewer #2 (Public Review):

      Summary:

      The landmark publication of the "Fly Atlas" in 2022 provided a single cell/nuclear transcriptomic dataset from 15 individually dissected tissues, the entire head, and the body of male and female flies. These data led to the annotation of more than 250 cell types. While certainly a powerful and data-rich approach, a significant step forward relies on mapping these data back to the organism in time and space. The goal of this manuscript is to map 150 transcripts defined by the Fly Atlas by FISH and in doing so, provide, for the first time, a spatial transcriptomic dataset of the adult fly. Using this approach (Molecular Cartography with Resolve Biosciences), the authors, furthermore, distinguish different RNA localizations within a cell type. In addition, they seek to use this approach to define previously unannotated clusters found in the Fly Atlas. As a resource for the community at large interested in the computational aspects of their pipeline, the authors compare the strengths and weaknesses of their approach to others currently being performed in the field.

      Strengths:

      1. The authors use Resolve Biosciences and a novel bioinformatics approach to generate a FISH-based spatial transcriptomics map. To achieve this map, they selected 150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset and were used in the 2022 paper to annotate specific cell types; moreover, the authors chose several highly expressed genes characteristic of unannotated cell types. Together, the approach and generated data are important next steps in translating the transcriptomic data to spatial data in the organism.

      Response: We thank the reviewer for this comment but would like to add that the statement that we selected “150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset” is not correct. We have chosen genes with widely differing expression levels (log-scale range of 3.95 in body, 5.76 in head). Many of the chosen genes are also transcription factors. In fact, the here introduced method is more sensitive than the single cell atlas: the tinman positive cells were readily located (even non-heart cells were found to express tinman), whereas in the single cell FCA data tinman expression is often not detected in the cardiomyocytes (Tinman is detected in 273 cells in the entire FCA (mean expression of 1.44 UMI in positive cells), and in 71 cells out of 273 cardial cells (26%)).

      Author response image 1.

      Density plots for body (left) and head (right) showing levels of gene expression detected in scRNA-seq (body: Fly Cell Atlas, Li et al. 2022, head: Pech et al. (2023)). Blue: all genes, red: genes used in the spatial study.

      1. Working with Resolve, the authors developed a relatively high throughput approach to analyze the location of transcripts in Drosophila adults. This approach confirmed the identification of particular cell types suggested by the FlyAtlas as well as revealed interesting subcellular locations of the transcripts within the cell/tissue type. In addition, the authors used co-expression of different RNAs to unbiasedly identify "new cell types". This pipeline and data provide a roadmap for additional analyses of other time points, female flies, specific mutants, etc.

      2. The authors show that their approach reveals interesting patterns of mRNA distribution (e.g alpha- and beta-Trypsin in apical and basal regions of gut enterocytes or striped patterns of different sarcomeric proteins in body muscle). These observations are novel and reveal unexpected patterns. Likewise, the authors use their more extensive head database to identify the location of cells in the brain. They report the resolution of 23 clusters suggested by the single-cell sequencing data, given their unsupervised clustering approach. This identification supports the use of spatial cell transcriptomics to characterize cell types (or cell states).

      3. Lastly, the authors compare three different approaches --- their own described in this manuscript, Tangram, and SpaGE - which allow integration of single cell/nuclear RNA-seq data with spatial localization FISH. This was a very helpful section as the authors compared the advantages and disadvantages (including practical issues, like computational time).

      Weaknesses:

      1. Experimental setup. It is not clear how many and, for some of the data, the sex of the flies that were analyzed. It appears that for the body data, only one male was analyzed. For the heads, methods say male and female heads, but nothing is annotated in the figures. As such, it remains unclear how robust these data are, given such a limited sample from one sex. As such, the claims of a spatial atlas of the entire fly body and its head ("a rosetta stone") are overstated. Also, the authors should clearly state in the main text and figure legends the sex, the age, how many flies, and how many replicates contributed to the data presented (not just the methods). What also adds to the confusion is the use of "n" in para 2 of the results. " ... we performed coronal sections at different depths in the head (n=13)..." 13 sections in total from 1 head or sections from 13 heads? Based on the body and what is shown in the figure, one assumes 13 sections from one head. Please clarify.

      Response: While we agree that sex differences present indeed an interesting opportunity to study with spatial transcriptomics, our goal was not to define male/female differences but rather to establish the technology to go into this detail if wanted in the future. In the revised version, we will provide a more detailed description of the sections, including their sex/genotype/age. We would like to point out that we verified the specificity of our FISH method on all the body sections (Figure 2A, TpnC4 & Act88F) and not only on one. Furthermore, we also would like to state that the idea of “a rosetta stone” was mentioned as a future prospect. We will rewrite the discussion to make this more clear.

      1. Probes selected: Information from the methods section should be put into the main text so that it is clear what and why the gene lists were selected. The current main text is confusing. If the authors want others to use their approach, then some testing or, at the very least, some discussion of lower expressed genes should be added. How useful will this approach be if only highly expressed genes can be resolved? In addition, while it is understood that the company has a propriety design algorithm for the probes, the authors should comment on whether the probes for individual genes detect all isoforms or subsets (exons and introns?), given the high level of splicing in tissues such as muscle.

      Response: As stated above, while there is a slight bias to higher expressed genes (as expected for marker genes), we have also used very low expressed genes like tinman (body) or sens (head). This shows that our method is more sensitive than single-cell data, as ALL cardiomyocytes can be identified by tinman expression and not only some are positive, as is the case in the FCA data. In fact, the method can’t resolve too highly expressed genes due to optical crowding of the signal leading to a worse quantification. For this reason, ninaE was removed from the analysis (as mentioned in Spatial transcriptomics allows the localization of cell types in the head and brain and in Methods).

      As mentioned in the Methods, the probes are designed on gene level targeting all isoforms, but favoring principal isoforms (weighted by APPRIS level). The high level of splicing is indeed interesting and we expect that in the future spatial transcriptomics can help to generate more insight in this.

      1. Imaging: it isn't clear from the text whether the repeated rounds of imaging impacted data collection. In many of what appear to be "stitched" images, there are gradients of signal (eg, figure 2F); please comment. Also, since this a new technique, could a before and after comparison of the original images and the segmented images be shown in the supplemental data so that the reader can better appreciate how the authors assessed/chose/thresholded their data? More discussion of the accuracy of spot detection would be helpful.

      Response: Any high-resolution imaging (pixel size = 138 nm) of a large field of view (>1mm) uses a stitching method to combine several individual images to reconstruct a large field of view. This does not generate signal gradients, apart from lower signal at the extreme edges of each of the individual images. The spot detection algorithm was written and used by Resolve Biosciences and benchmarked for human (Hela) and mouse (NIH-3T3) cell lines in Groiss et al. 2021 (Highly resolved spatial transcriptomics for detection of rare events in cells, biorxiv). The specificity of the decoded probes was found to lie between 99.45 and 99.9% here, matching the results we found for TpnC4 and Act88F (99.4 and 99.8%). We will add their analysis to our discussion.

      1. The authors comment on how many RNAs they detected (first paragraph of results). How do these numbers compare to the total mRNA present as detected by single-cell or single-nuclear sequencing?

      Response: The total number of mRNAs detected per spatial transcriptomics experiment is much higher for the body samples compared to single-cell experiments (FCA data). In the head it is slightly lower, but here it is important to note that not all cell types are present in each slice in the head (while they are all present in the head scRNA experiments). A comparison on the cell-type level would be more meaningful, and we will investigate this for the revision.

      Author response image 2.

      Barplots showing total number of mRNA molecules detected in Molecular Cartography (Resolve, spatial spots) and in snRNA-seq data from the Fly Cell Atlas (10x Genomics, UMIs). Individual black dots show individual experiments, counts are only shown for the chosen gene panel for each sample. Bar shows the mean, with error bars representing the standard error.

      1. Using this higher throughput method of spatial transcriptomics, the authors discern different cell types and different localization patterns within a tissue/cell type.

      a. The authors should comment on the resolution provided by this approach, in terms of the detection of populations of mRNAs detected by low throughput methods, for example, in glia, motor neuron axons, and trachea that populate muscle tissue. Are these found in the images? Please show.

      Response: We did not add any markers for trachea in our gene panel, but we do detect sparse spots of repo (glia) and elav/VGlut in the muscle tissues (Gad1/VAChT are hardly detected in the muscle tissue). This is consistent with the glutamatergic nature of motor neurons in Drosophila as described previously (Schuster CM (2006) Glutamatergic synapses of Drosophila neuromuscular junctions: a high-resolution model for the analysis of experience-dependent potentiation. Cell Tissue Res 326: 287–299.)

      Author response image 3.

      Molecular Cartography zoomed in on indirect flight muscle. Segmented nuclei are shown in white (based on DAPI), scalebars represent 100 μm).

      b. The authors show interesting localization patterns in muscle tissue for different sarcomere protein-coding mRNAs, including enrichment of sls in muscle nuclei located near the muscle-tendon attachment sites. As this high throughput approach is newly being applied to the adult fly, it would increase confidence in these data, if the authors would confirm these data using a low throughput FISH technique. For example, do the authors detect such alternating "stripes" ( Act 88F, TpnC4, and Mhc) or enriched localization (sls) using FISH that doesn't rely on the repeated colorization, imaging, decolorization of the probes?

      Response: We thank the reviewer for their interest in the localization patterns in muscle tissue. We could confirm localized mRNA in all the sections analyzed, in flight muscles as well as in leg muscles. We furthermore show that Act 88F, TpnC4 are not detected outside of flight muscle cells (99.4% and 99.8% of the single molecular signal in flight muscles only). Hence, we already show the specificity test in a much more quantitative way compared to traditional FISH, which often includes amplification.

      1. The authors developed an unbiased method to identify "new cell types" which relies on co-expression of different transcripts. Are these new cell types or a cell state? While expression is a helpful first step, without any functional data, the significance of what the authors found is diminished. The authors need to soften their statements.

      Response: The term “new cell types” only appears in the title. We agree that with the current spatial map we cannot be sure to have found “new cell types”, instead we have shown where unannotated clusters from scRNA-seq map, based on gene expression. Therefore, we will tone down the title in the revised version and thank the reviewer for this valuable suggestion.

      Appraisal:

      The authors' goal is to map single cell/nuclear RNAseq data described in the 2022 Fly Atlas paper spatially within an organism to achieve a spatial transcriptomic map of the adult fly; no doubt, this is a critical next step in our use of 'omics approaches. While this manuscript does the hard work of trying to take this next step, including developing and testing a new pipeline for high throughput FISH and its analysis, it falls short, in its present form, in achieving this goal. The authors discuss creating a robust spatial map, based on one male fly. Moreover, they do not reveal principles of mRNA localization, as stated in the abstract; they show us patterns, but nothing about the logic or function of these patterns. This same criticism can be said of the identification of "new cell types, just based on RNA colocalization. In both cases (mRNA subcellular localization or cell type identification), further data in the form of validation with traditional low throughput FISH and genetic manipulations to assess the relation to cell function are required for the authors to make such claims.

      Response: We have indeed used one male fly for the adult male body data. This is mainly due to the cost of the sample processing. We used 12 individuals for the head samples (from 1 individual we acquired 2 sections, a total of 13 sections). We show that the body samples show a high correlation with each other, while the head samples cover multiple depths of the head. Still, even in the head, we find that sections at similar depths show a high similarity to each other in terms of gene-gene co-expression and expression patterns. Although obtaining more sections would be valuable, we don’t believe it to be necessary for the current goals. Additional replicates beyond the ones we already provide would require significant amounts of extra time and budget, while they would produce similar results as we already show. We are therefore reluctant to repeat the effort again.

      The usage of the term “new cell types” is indeed ambiguous and we will tone this down in the revised version. Instead, we meant that unannotated clusters could be mapped to their location. In the text, we further specify that this means that now we only have inferred the location of the nuclei and that for neurons their function/processes are still unknown. As such, our data provides a starting point to identify new cell types since their marker genes and nuclear location are inferred. The next step to identify “new cell types” would indeed be to acquire genetic access to the cell types and characterize them in more detail. This is currently beyond our goals, and therefore we will tone down the title in the revised version and thank the reviewer for this valuable suggestion.

      Discussion of likely impact:

      If revised, these data, and importantly the approach, would impact those working on Drosophila adults as well as those working in other model systems where single cell/nuclear sequencing is being translated to the spatial localization within the organism. The subcellular localization data - for example, the size of transcripts and how that relates to localization or the patterns of sarcomeric protein localization in muscle - are intriguing, and would likely impact our thinking on RNA localization, transport, etc if confirmed. Lastly, the authors compare their computational approaches to those available in the field; this is valuable as this is a rapidly evolving field and such considerations are critical for those wishing to use this type of approach.

      Response: We believe that our manuscript as it stands now is already an “important” paper that will strongly impact the Drosophila community (and beyond the spatial transcriptomics community). As it stands, it provides the groundwork for a full Drosophila adult spatial atlas, similar to how early scRNA-seq datasets provided a framework for the Fly Cell Atlas. In the manuscript we provide both experimental information on how to successfully perform spatial transcriptomics (treating slides for optimal attachment) and the data serves as a benchmark for future experiments to improve upon (similar to how early Drop-seq datasets were compared to later 10x datasets in single-cell transcriptomics). In addition, it also provides proof of principle methods on how to integrate the FCA data with these spatial data and it identifies localized mRNA species in large adult muscle cells, showing the complementarity of spatial techniques with single-cell RNA-seq. To conclude, this is the first spatial adult Drosophila transcriptomics paper, locating 150 mRNA species with easy data access in our user portal (https://spatialfly.aertslab.org/).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1. The most important concern that I have refers to the FDTD simulations to characterize the ZMW, as shown in Appendix 2, Figure 4. So far, the explanations given in the caption of Figure 4 are confusing and misleading: the authors should provide more detailed explanations on how the simulations were performed and the actual definition of the parameters used. In particular:

      a. lines 1330-1332: it is not clear to me how the fluorescence lifetime can be calculated from the detected signal S (z), and why they are horizontal, i.e., no z dependence? Which lifetimes are the authors referring to?

      b. lines 1333-1335: Where do these values come from? And how do they relate to panels D & E? From what I can see in these panels the lifetimes are highly dependent on z and show the expected reduction of lifetime inside the nanostructures.

      c. lines 1336-1337: Why the quantum yield of the dyes outside the ZMW differs from those reported in the literature? In particular the changes of quantum yield and lifetime for Alexa 488 are very large (also mentioned in the corresponding part of Materials & Methods but not explained in any detail).

      We thank the Reviewer for his detailed questions on the FDTD simulations. We have now added the missing equation related to the computation of signal-averaged fluorescence lifetimes from the FDTD simulations. Specifically to the three points raised:

      a) The fluorescence lifetime is indeed not calculated from the detected signal S(z), but from the radiative and non-radiative rates in the presence of the ZMW as given in eq. 9-10. However, we use the detected signal S(z) to compute the average fluorescence lifetime over the whole z-profile of the simulation box, which we relate to the experimentally measured fluorescence lifetimes as given in Appendix 7, Figure 1. We have now added the equation to compute the signal-weighted fluorescence lifetimes, which we denote as <𝜏>S , in eq. 13 in the methods. To clarify this point, we have added the symbol <𝜏>S to the plots in Appendix 2, Figure 4 D-E and Appendix 7, Figure 1 C-D.

      b) The estimated lifetimes were obtained as the signal-weighted average over the lifetime profiles, (<𝜏>S) as given in the new eq. 13. All plotted quantities, i.e., the detection efficiency η, quantum yield ϕ, detected signal S(z), and fluorescence lifetime, are computed from the radiative and loss rates obtained from the FDTD simulation according to eqs. 8-11. To make this clearer, we have now added the new Appendix 2 – Figure 5 which shows the z-profiles of the quantities (radiative and loss rates) used to derive the experimental observables.

      c) There are multiple reasons for the differences of the quantum yields of the two analytes used in this study compared to the literature values. For cyanine dyes such as Alexa647, it is well known that steric restriction (as e.g. caused by conjugation to a biomolecule) can lead to an increase of the quantum yield and fluorescence lifetime. We observe a minor increase of the fluorescence lifetime for Alexa647 from the literature value of 1.17 ns to a value of 1.37 ns when attached to Kap95, which is indicative of this effect. In the submitted manuscript, this was discussed in the methods in lines 936-938 (lines 938-945 in the revised manuscript). For the dye Alexa488, which is used to label the BSA protein, this effect is absent. Instead, we observe (as the Reviewer correctly notes) a quite drastic reduction of the fluorescence lifetime compared to the unconjugated dye from 4 ns to 2.3 ns. In cases where a single cysteine is labeled on a protein, such a drastic reduction of the quantum yield usually indicates the presence of a quenching moiety in proximity of the labeling site, such as tryptophane, which acts via the photo-induced electron transfer mechanism. Indeed, BSA contains two tryptophanes that could be responsible for the low quantum yield of the conjugated dyes. The situation is complicated by the fact that BSA contains 35 cysteines that can potentially be labeled (although 34 are involved in disulfide bridges). The labeled BSA was obtained commercially and the manufacturer lists the degree of labeling as ~6 dye molecules per protein, with a relative quantum yield of 0.2 compared to the standard fluorescein. This corresponds to an absolute quantum yield of ~0.16, which is low compared to the literature value for Alexa488 of ~0.8.

      Based on the measured fluorescence lifetime, we estimate a quantum yield of 0.46, which is higher than the photometrically obtained value of 0.16 reported by the manufacturer. Fully quenched, nonfluorescent dyes will not contribute to the lifetime measurement but are detected in the photometric quantum yield estimates. The difference between the lifetime and photometric based quantum yield estimates thus suggest that part of the fluorophores are almost fully quenched. While it is unknown where the dyes are attached to the protein, the low quantum yield could be indicative of dye-dye interactions via pi-pi stacking, which can often lead to non-fluorescent dimers. This is supported by the fact that the manufacturer reports color differences between batches of labeled protein, which indicate spectral shifts of the absorption spectrum when dye-dye adducts are formed by π-π stacking. We have now added a short discussion of this effect in lines 938-941. We note that the conclusions drawn on the quenching effect of the metal nanostructure remain valid despite the drastic reduction of the quantum yield for Alexa488, which leads to a further quantum yield reduction of the partly quenched reference state.

      2) A second important concern refers to Figure 3: Why is there so much variability on the burst intensities reported on panels C, D? They should correspond to single molecule translocation events and thus all having comparable intensity values. In particular, the data shown for BSA in panel D is highly puzzling, since it not only reflects a reduced number of bursts (which is the main finding) but also very low intensity values, suggesting a high degree of quenching of the fluorophore being proximal to the metal on the exit side of the pore. In fact, the count rates for BSA on the uncoated pore range form 50-100kcounts/s, while on the coated pores thy barely reach 30 kcounts/s, a clear indication of quenching. Importantly, and in direct relation to this, could the authors exclude the possibility that the low event rates measured on BSA are largely due to quenching of the dye by getting entangled in the Nsp mesh just underneath the pore but in close contact to the metal?

      The Reviewer raises a valid concern, but further analysis shows that this is unproblematic. Notably, the burst intensities are in fact not reduced, in contrast to the visual impression obtained from the time traces shown in the figure. The time trace of the BSA intensity is visually dominated by high-intensity bursts which mask the low-intensity bursts in the plot. In contrast, in Figure 3 the reduced number of BSA events results in a sparser distribution of the intensity spikes, which allows low-intensity events to be seen. Different to the visual inspection, the spike-detection algorithm does not exhibit any bias in terms of the duration or the number of photons of the detected events between the different conditions for both BSA and Kap95, as shown in the new Appendix 7 – Figure 1. Using FCS analysis it can be tested whether the event duration varies between the different conditions shown in Figure 3 C-D. This did not show a significant difference in the estimated diffusion time for BSA (Appendix 7 – Figure 1 C,D). Contrary to the suggestion of the Reviewer, we also do not observe any indication of quenching by the metal between uncoated and Nsp1-coated pores for BSA. Such quenching should result in differences of the fluorescence lifetimes, which however is not evident in our experimental data (Appendix 7 – Figure 1 F).

      3) Line 91: I suggest the authors remove the word "multiplexed" detection since it is misleading. Essentially the authors report on a two-color excitation/detection scheme which is far from being really multiplexing.

      We have changed the word to “simultaneous” now and hope this avoids further confusion.

      4) Line 121: why are the ZMW fabricated with palladium? Aluminum is the gold-standard to reduce light transmissivity. An explanation for the choice of this material would be appreciated by the community.

      In a previous study (Klughammer and Dekker, Nanotechnology, 2021), we established that palladium can have distinct advantages compared to other ZMW metals such as aluminum and gold, most prominently, an increased chemical stability and reduced photoluminescence. For this study, we chose palladium over aluminum as it allowed the use of simple thiol chemistry for surface modification. In the beginning of the project, we experimented with aluminum pores as well. We consistently found that the pores got closed after measuring their ionic conductance in chlorine-containing solutions such as KCl or PBS. This problem was avoided by choosing palladium.

      5) Lines 281-282: This statement is somewhat misleading, since it reads such that the molecules stay longer inside the pore. However, if I understand correctly, these results suggest that Kap95 stays closer to the metal on the exit side. This is because measurements are being performed on the exit side of the pore as the excitation field inside the pore is quite negligible.

      We thank the Reviewer for this comment and have clarified the text in lines 290-292 as suggested to: “(…) this indicates that, on the exit side, Kap95 diffuses closer to the pore walls compared to BSA due to interactions with the Nsp1 mesh”

      6) Lines 319-320: Although the MD simulations agree with the statement being written here, the variability could be also due to the fact that the proteins could interact in a rather heterogenous manner with the Nsp mesh on the exit side of the pore, transiently trapping molecules that then would stay longer and/or closer to the metal altering the emission rate of the fluorophores. Could the authors comment on this?

      The variation mentioned in the text refers to a pore-to-pore variation and thus needs to be due to a structural difference between individual pores. This effect would also need to be stable for the full course of an experiment, typically hours. We did not find any structural changes in the fluorescence lifetimes measured on individual pores such as suggested by the Reviewer. We think that the suggested mechanism would show up as distinct clusters in Appendix 7 – Figure 1 E,F where we found no trace of such a change to happen. If we understand correctly, the Reviewer suggests a mechanism, not based on changes in the Nup layer density, that would lead to a varying amount of trapping of proteins close to the surface. Such a behavior should show up in the diffusion time of each pore ( Appendix 7 – figure 1 C,D), where we however find no trace of such an effect.

      7) Lines 493-498: These claims are actually not supported by the experimental data shown in this contribution: a) No direct comparison in terms of signal-to-noise ratio between fluorescence-based and conductance-based readouts has been provided in the ms. b) I would change the word multiplexed by simultaneous since it is highly misleading. c) The results shown are performed sequentially and thus low throughput. d) Finally, the use of unlabeled components is dubious since the detection schemes relies on fluorescence and thus requiring labeling.

      We thank the Reviewer for pointing this out.

      a) We have now added a section in appendix 3 that discusses the signal-to-noise ratios. In brief, there are three observations that led us to conclude that ZMWs provide beneficial capabilities to resolve individual events from the background:

      1. The signal-to-background ratio was determined to be 67±53 for our ZMW data of Kap95 which is an order of magnitude higher compared to the ~5.6 value for a conductance-based readout.

      2. The detection efficiency for ZMWs is independent of the Kap95 occupancy within the pore. This is different from conductance based approaches that have reduced capability to resolve individual Kap95 translocations at high concentrations.

      3. The fraction of detected translocations is much higher for ZMWs than for conductance-based data (where lots of translocations occur undetected) and matches closer to the theoretical predictions.

      b) We have changed the wording accordingly.

      c) We agree with the Reviewer that our method is still low throughput. However, the throughput is markedly increased compared to previous conductance-based nanopore measurements. This is because we can test many (here up to 8, but potentially many more) pores per chip in one experiment, whereas conductance-based readouts are limited to a single pore. We have now changed the wording to “increased throughput” in line 507 to avoid confusion.

      d) We agree that only labeled components can be studied directly with our methods. However, the effect of unlabeled analytes can be assessed indirectly without any perturbation of the detection scheme due to the specificity of the fluorescent labeling. This is distinct from previous nanopore approaches using a conductance-based readout that lack specificity. In our study, we have for example used this advantage of our approach to access event rates at high concentrations (1000nM Kap95, 500nM BSA) and large pore diameters by reducing the fraction of labeled analyte in the sample. Finally, the dependence of the BSA leakage rate as a function of the concentration of Kap95 (Figure 6) relies on a specific readout of BSA events in the presence of large amounts of Kap95, which would be impossible in conductance-based experiments.

      8) Line 769: specify the NA of the objective. Using a very long working distance would also affect the detection efficiency. Have the authors considered the NA of the objective on the simulations of the detection efficiency? This information should be included and it is important as the authors are detecting single molecule events.

      We used an NA of 1.1 for the simulation of the Gaussian excitation field in the FDTD simulations, corresponding to the NA of the objective lens used in the experiments and as specified in the methods. The Reviewer is correct that the NA also affects the absolute detection efficiency of the fluorescence signal due to the finite opening angle of the collection cone of ~56˚. In our evaluation of the simulations, we have neglected this effect for simplicity, because the finite collection efficiency of the objective lens represents only an additional constant factor that does not depend on the parameters of the simulated system, such as the pore diameter. Instead, we focused solely the effect of the ZMW and defined the detection efficiency purely based on the fraction of the signal that is emitted towards the detection side and can potentially be detected in the experiment, which also provides the benefit that the discussed numbers are independent of the experimental setup used.

      To clarify this, we have now made this clearer in the method text on lines 917-920.

      9) Line 831: I guess that 1160ps is a mistake, right?

      This is not a mistake. We performed a tail fit of the fluorescence decay curves, meaning that the initial rise of the decay was excluded from the fit. The initial part of the fluorescence decay is dominated by the instrument response function (IRF) of the system, with an approximate width of ~500 ps. To minimize the influence of the IRF on the tail fit, we excluded the first ~1 ns of the fluorescence decay.

      10) Lines 913-917: Why are the quantum yield of Alexa 488 and lifetime so much reduced as compared to the published values in literature?

      See answer to point 1. We have added a short discussion at lines 938-941 where we speculate that the reduced quantum yield is most likely caused by dye-dye interactions due to the high degree of labeling of ~6 dyes per protein.

      11) Lines 1503-1509: The predicted lifetimes with the Nsp-1 coating have not been shown in Appendix 2 - Figure 4. How have they been estimated?

      We have not performed predictions of fluorescence lifetimes in the presence of an Nsp1 coating. Predictions of the fluorescence lifetime in the absence of the Nsp1 coating were obtained by assuming a uniform occupancy of the molecules over the simulation box. A prediction of the fluorescence lifetimes in the presence of the Nsp1 coating would require a precise knowledge of the spatial distribution of analytes, which depends, among other factors, on the extension of the Nsp1 brushes and the interaction strengths with the FG repeats. While simulations provide some insights on this, we consider a quantitative comparison of predicted and measured fluorescence lifetimes in the presence of the Nsp1 coating beyond the scope of the present study.

      12) Lines 1534-1539: I disagree with this comment, since the measurements reported here have been performed outside the nano-holes, and thus the argument of Kap95 translocating along the edges of the pore and being responsible for the reduced lifetime does not make sense to me.

      In accordance with our answer to point 5 above, we have now changed the interpretation to the proximity of Kap95 to the metal surface on the exit side, rather than speculating on the path that the protein takes through the pore (lines 1662-1664), as follows:

      “This indicates that, in the presence of Nsp1, Kap95 molecules diffuse closer to or spend more time in proximity of the metal nanoaperture on the exit side.”

      Reviewer #2:

      (Numbers indicate the line number.)

      48: should cite more recent work: Timney et al. 2016 Popken et al 2015

      59: should cite Zilman et al 2007, Zilman et al 2010

      62: should cite Zilman et al 2010

      We thank the Reviewer for the suggestions and have added them to the manuscript now.

      65: one should be careful in making statements that the "slow" phase is immobile, as it likely rapidly exchanging NTRs with the "fast" phase.

      We have removed this description and replaced it by “This 'slow phase' exhibits a reduced mobility due to the high affinity of NTRs to the FG-Nup mesh.” to avoid misunderstanding.

      67: Schleicher 2014 does not provide evidence of dedicated channels

      We agree with the Reviewer and therefore moved the reference to an earlier position in the sentence.

      74-75: must cite work by Lusk & Lin et al on origami nanochannels

      We thank the Reviewer for this suggestion. We have now added a reference to the nanotraps of Shen et al. 2021, JACS, in line 75. In addition, we now also refer to Shen et al. 2023, NSMB, in the discussion where viral transport is discussed.

      77: Probably Jovanovic- Talisman (2009)?

      We thank the Reviewer for pointing out this typo.

      93; should cite Auger&Montel et al, PRL 2014

      We thank the Reviewer for pointing out this reference. To give proper credit to previous ZMW, we have now incorporated a sentence in lines 100-102 citing this reference.

      111-112: there appears to be some internal inconsistency between this interpretation and the BSA transport mostly taking place through the "central hole" (as seems to be implied by Equation (3). Probably it should be specified explicitly that the "central hole" in large channels is a "void".

      We thank the Reviewer for this suggestion and have added a clarifying sentence.

      115-177: This competition was studied in Jovanovic-Talisman 2009 and theoretically analysed in Zilman et al Plos Comp Biol 2010. The differences in the results and the interpretation should be discussed.

      We agree, therefore it is discussed in the discussion section (around line 594) and now added the reference to Zilman et al.

      Figure 2 Caption: "A constant flow..." - is it clear that is flow does not generate hydrodynamic flow through the pore?

      The Reviewer raises an important point. Indeed, the pressure difference over the membrane generates a hydrodynamic flow through the pore that leads to a reduction of the event rate compared to when no pressure is applied. However, as all experiments were performed under identical pressures, one can expect a proportional reduction of the absolute event rates due to the hydrodynamic flow against the concentration gradient. In other words, this will not affect the conclusions drawn on the selectivity, as it is defined as a ratio of event rates.

      We have now added additional data on the influence of the hydrodynamic flow on the translocation rate in Appendix 3 – Figure 2, where we have measured the signal of free fluorophores at high concentration on the exit side of the pore as a function of the applied pressure. The data show a linear dependence of the signal reduction on the applied pressure. At the pressure values used for the experiments of 50 mbar, we see a ~5% reduction compared to the absence of pressure, implying that the reported absolute event rates are underestimated only by ~5%. Additionally we have added such data for Kap95 translocations that shows a similar effect (however less consistent). Measuring the event rate at zero flow is difficult, since this leads to an accumulation of fluorophores on the detection side.

      Figure 3: it would help to add how long is each translocation, and what is the lower detection limit. A short explanation of why the method detects actual translocations would be good

      With our method, unfortunately, we can not assess the duration of a translocation event since we only see the particle as it exists the pore. Instead, the measured event duration is determined by the time it takes for the particle to diffuse out of the laser focus. This is confirmed by FCS analysis of translocation events that show the same order of magnitude of diffusion times as for free diffusion (Appendix 7 – Figure 1 C,D) in contrast to a massively reduced diffusion time within a nanopore. In Figure 2D we show the detection efficiency at different locations around the ZMW as obtained from FDTD simulations and discuss the light blocking. This clearly shows that the big majority of the fluorescence signal comes from the laser illuminated side and therefore only particles that translocated through the ZMW are detected as presented between lines 170-190. In Yang et al. 2023, bioRxiv (https://doi.org/10.1101/2023.06.26.546504) a more detailed discussion about the optical properties of Pd nanopores is given.

      This point also explains why we see actual translocations: since the light is blocked by the ZMW, fluorophores can only be detected after they have translocated. On parts of the membrane without pores and upstream the amount of spikes found in a timetrace was found to be negligibly small. Additionally, if a significant part of the signal would be contributed by leaking fluorescence from the dark top side, there should no difference in BSA event rate found between small open and Nsp1 pores which we did not observe.

      With respect to the lower detection limit for events: In the burst search algorithm we require a false positive level rate of lower than 1 event in 100. Additionally, as described in Klughammer and Dekker, Nanotechnology (2021), we apply an empirical filtering to remove low signal to noise ratio events that contain less than 5 detected photons per event or a too low event rate. From the event detection algorithm there is no lower limit set on the duration of an event. Such a limit is then set by the instrument and the maximum frequency it which it can detect photons. This time is below 1μs. Practically we don’t find events shorter than 10μs as can be seen in the distribution of events where also the detection limits can be estimated (Appendix 7 – figure 1 A and B.)

      Equation (1): this is true only for passive diffusion without interactions (see eg Hoogenboom et al Physics Reports 2021 for review). Using it for pores with interactions would predict, for instance, that the inhibition of the BSA translocation comes from the decrease in D which is not correct.

      We agree with the Reviewer that this equation would not reproduce the measured data in a numerically correct way. We included it to justify why we subsequently fit a quadratic function to the data. As we write in line 260 we only used the quadratic equation “as a guide to the eye and for numerical comparison” and specifically don’t claim that this fully describes the translocation process. In this quadratic function, we introduced a scaling factor α that can be fitted to the data and thus incorporates deviations from the model. In appendix 5 we added a more elaborate way to fit the data including a confinement-based reduction of the diffusion coefficient (although not incorporating interactions). Given the variations of the measured translocation rates, the data is equally well described by both the simple and the more complex model function.

      Equation (1): This is not entirely exact, because the concentration at the entrance to the pore is lower than the bulk concentration, which might introduce corrections

      We agree with the Reviewer and have added that the concentration difference Δc is measured at the pore entrance and exit, and this may be lower than the bulk concentration. As described in our reaction to the Reviewer’s previous comment, equation (1) only serves as a justification to use the quadratic dependence and any deviations in Δc are absorbed into the prefactor α in equation (2).

      Equation (3): I don't understand how this is consistent with the further discussion of BSA translocation. Clearly BSA can translocate through the pore even if the crossection is covered by the FG nups (through the "voids" presumably?).

      The Reviewer raises an important point here. Equation 3 can only be used for a pore radius r > rprot + b. b was determined to be 11.5 nm and rprot is 3.4 nm for BSA, thus it needs to be that r > 15 nm. We would like to stress, however, that b does not directly give a height of a rigid Nsp1 ring but is related to the configuration of the Nsp1 inside the pore. Equation (3) (and equation (2)) were chosen because even these simple equations could fit the experimentally measured translocation rates well, and not because they would accurately model the setup in the pore. As we found from the simulations, the BSA translocations at low pore diameters presumably happen through transient openings of the mesh. The dynamics leading to the stochastic opening of voids on average leads to the observed translocation rate.

      296-297: is it also consistent with the simulations?

      We compare the experimentally and simulated b values in lines 387-388 and obtained b=9.9 ± 0.1 nm from the simulations (as obtained from fitting the translocation rates and not from measuring the extension of the Nsp1 molecules) and 11.5 ± 0.4 nm from the experiments – which we find in good agreement.

      331: has it been established that the FG nups equilibrate on the microsecond scale?

      As an example, we have analyzed the simulation trajectory of the most dense nanopore (diameter = 40 nm, grafting = 1/200 nm2). In Author response image 1 we show for each of the Nsp1-proteins how the radius of gyration (Rg) changes in time over the full trajectory (2 μs + 5 μs). As expected, the Rg values reached the average equilibrium values very well within 2 μs simulation time, showing that the FG-Nups indeed equilibrate on the (sub)microsecond scale.

      Author response image 1.

      334-347: the details of the method should be explained explicitly in the supplementary (how exactly voids distributions are estimated and the PMF are calculated etc)

      The void analysis was performed with the software obtained from the paper of Winogradoff et al. In our Methods we provide an overview of how this software calculates the void probability maps and how these are converted into PMFs. For a more detailed description of how exactly the analysis algorithm is implemented in the software, we refer the reader to the original work. The analysis codes with the input files that were used in this manuscript have been made public ( https://doi.org/10.4121/22059227.v1 ) along with the manuscript.

      Equation (4) is only an approximation (which works fine for high barriers but not the low ones). Please provide citations/derivation.

      To our knowledge, the Arrhenius relation is a valid approximation for our nanopore simulations. We are unaware of the fact that it should not work for low barriers and cannot find mention of this in the literature. It would be helpful if the Reviewer can point us to relevant literature.

      Figure 4: how was transport rate for Kaps calculated?

      As mentioned in lines 388-391, we assumed that the Kap95 translocation rate through Nsp1-coated pores is equal to that for open pores, as we did not observe any significant hindrance of Kap95 translocation by the Nsp1 mesh in the experiment (Figure 4 A,C).

      378: It's a bit strange to present the selectivity ratio as prediction of the model when only BSA translocation rate was simulated (indirectly).

      We agree with the Reviewer that ideally we should also simulate the Kap95 translocation rate to obtain an accurate selectivity measure of the simulated nanopores. However, as the experiments showed very similar Kap95 translocation rates for open pores and Nsp1-coated pores, we believe it is reasonable to take the Kap95 rates for open and Nsp1-pores to be equal.

      Figure 5C and lines 397: I am a bit confused how is this consistent with Figure 4D?

      Figure 5C and figure 4D both display the same experimental data, where 4D only focuses on a low diameter regime. In relation to line 397 (now 407), the Nsp1 mesh within the 60-nm pore dynamically switches between closed configurations and configurations with an open channel. When taking the temporal average of these configurations, we find that the translocation rate is higher than for a closed pore but lower than for a fully open pore. The stochastic opening and closing of the Nup mesh results in the continuous increase of the translocation rates with increasing diameter, which is in contrast to a step-wise increase that would be expected from an instantaneous collapse of the Nsp1 mesh at a certain pore diameter.

      428-439: Please discuss the differences from Jovanovic-Talisman 2009.

      How our results for a Kap95 induced change of the BSA translocation rate are related to previous literature is discussed extensively in the lines 598-620.

      440: How many Kaps are in the pore at different concentrations?

      This is a very interesting question that we were, unfortunately, not able to answer within the scope of this project. With our fluorescent based methods we could not determine this number because the excitation light does not reach well into the nanopore.

      In our previous work on Nsp1-coated SiN nanopores using conductance measurements, we quantified the drop in conductance at increasing concentrations of Kap95 (Fragasso et al., 2023, NanoResearch, http://dx.doi.org/10.1007/s12274-022-4647-1). From this, we estimated that on average ~20 Kap95 molecules are present in a pore with a diameter of 55 nm at a bulk concentration of 2 µM. In these experiments, however, the height of the pore was only ~20 nm, which is much lower compared to 100 nm long channel used here, and the grafting density of 1 per 21 nm2 was high compared to the grafting density here of 1 per 300 nm2. Assuming that the Kap95 occupancy scales linearly with the number of binding sites (FG repeats) in the vicinity of the pore, and hence the amount of Nsp1 molecules bound to the pore, we would expect approximately ~7 Kap95 molecules in a pore of similar diameter under saturating (> 1 µM) concentrations.

      On the other hand, the simulations showed that the density of Nsp1 within the pore is equal to the density within the 20-nm thick SiN pores (line 380). For the longer channel and lower grafting density used here, Nsp1 was also more constrained to the pore compared to thinner pores used in previous studies (Fragasso et al., 2023, NanoResearch), where the grafted protein spilled out from the nanopores. Thus assuming that the Kap95 occupancy depends on the protein density in the pore volume rather than the total protein amount grafted to the pore walls, we would estimate a number of 100 Kap95 molecules per pore.

      These varying numbers already show that we cannot accurately provide an estimate of the Kap95 occupancy within the pore from our data due to limitations of the ZMW approach.

      445: how is this related to the BSA translocation increase?

      For the calculation of the selectivity ratio, we assumed the normalized Kap95 translocation rate to be independent of the Kap95 concentration. Hence, the observed trends of the selectivity ratios at different concentrations of Kap95, as shown in Figure 6 D, are solely due to a change in the BSA translocation rate at different concentrations of Kap95, as given in Figure 6 B,C.

      462-481: it's a bit confusing how this interfaces with the "void" analysis ( see my previous comments)

      We agree that the phenomenological descriptions in terms of transient openings (small, dynamic voids) that for larger pores become a constantly opened channel (a single large, static void) might cause some confusion to the reader. In the last part of the results, we aimed to relate the loss of the BSA rate to a change of the Nsp1 mesh. We acknowledge that the model of a rim of Nsp1 and an open center described in Figure 5F is highly simplifying . We now explain this in the revised paper at lines 483-486 by referring to an effective layer thickness which holds true under the simplifying assumption of a central transport channel.

      Figure 6D: I think the illustration of the effect of kaps on the brush is somewhat misleading: at low pore diameters, it is possible that the opposite happens: the kaps concentrate the polymers towards the center of the pore. It should be also made clear that there are no kaps in simulations (if I understand correctly?)

      Indeed, at small pore diameters we think it would be possible to observe what the Reviewer describes. The illustration should only indicate what we think is happening for large pore diameters where we observed the opening of a central channel. To avoid confusion, we now shifted the sketches to panel G where the effective layer thickness is discussed.

      Indeed, as stated in lines 331-340 no Kap95 or BSA molecules were present in the simulations. We have now clarified this point in lines 872-876.

      518: Please provide more explanation on the role of hydrodynamics pressure.

      We have now performed additional experiments and quantified the effect of the pressure to be a ~5% reduction of the event rates, as described in the answer to a previous question above.  

      Reviewer #3 (Recommendations For The Authors):

      No experiments have been performed with the Ran-Mix regeneration system. It would be beneficial to add Ran-Mix to the trans compartment and see how this would affect Kap95 translocation events frequency and passive cargo diffusion. As the authors note in their outlook, this setup offers an advantage in using Ran-Mix and thus could also be considered here or in a future follow-up study.

      We thank the Reviewer for this suggestion. We think, however, that it is beyond the scope of this paper and an interesting subject for a follow-up study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study represents a comprehensive computational analysis of Plasmodium falciparum gene expression, with a focus on var gene expression, in parasites isolated from patients; it assesses changes that occur as the parasites adapt to short-term in vitro culture conditions. The work provides technical advances to update a previously developed computational pipeline. Although the findings of the shifts in the expression of particular var genes have theoretical or practical implications beyond a single subfield, the results are incomplete and the main claims are only partially supported.

      The authors would like to thank the reviewers and editors for their insightful and constructive assessment. We particularly appreciate the statement that our work provides a technical advance of our computational pipeline given that this was one of our main aims. To address the editorial criticisms, we have rephrased and restructured the manuscript to ensure clarity of results and to support our main claims. For the same reason, we removed the var transcript differential expression analysis, as this led to confusion.

      Public Reviews:

      Reviewer #1:

      The authors took advantage of a large dataset of transcriptomic information obtained from parasites recovered from 35 patients. In addition, parasites from 13 of these patients were reared for 1 generation in vivo, 10 for 2 generations, and 1 for a third generation. This provided the authors with a remarkable resource for monitoring how parasites initially adapt to the environmental change of being grown in culture. They focused initially on var gene expression due to the importance of this gene family for parasite virulence, then subsequently assessed changes in the entire transcriptome. Their goal was to develop a more accurate and informative computational pipeline for assessing var gene expression and secondly, to document the adaptation process at the whole transcriptome level.

      Overall, the authors were largely successful in their aims. They provide convincing evidence that their new computational pipeline is better able to assemble var transcripts and assess the structure of the encoded PfEMP1s. They can also assess var gene switching as a tool for examining antigenic variation. They also documented potentially important changes in the overall transcriptome that will be important for researchers who employ ex vivo samples for assessing things like drug sensitivity profiles or metabolic states. These are likely to be important tools and insights for researchers working on field samples.

      One concern is that the abstract highlights "Unpredictable var gene switching..." and states that "Our results cast doubt on the validity of the common practice of using short-term cultured parasites...". This seems somewhat overly pessimistic with regard to var gene expression profiling and does not reflect the data described in the paper. In contrast, the main text of the paper repeatedly refers to "modest changes in var gene expression repertoire upon culture" or "relatively small changes in var expression from ex vivo to culture", and many additional similar assessments. On balance, it seems that transition to culture conditions causes relatively minor changes in var gene expression, at least in the initial generations. The authors do highlight that a few individuals in their analysis showed more pronounced and unpredictable changes, which certainly warrants caution for future studies but should not obscure the interesting observation that var gene expression remained relatively stable during transition to culture.

      Thank you for this comment. We were happy to modify the wording in the abstract to have consistency with the results presented by highlighting that modest but unpredictable var gene switching was observed while substantial changes were found in the core transcriptome. Moreover, any differences observed in core transcriptome between ex vivo samples from naïve and pre-exposed patients are diminished after one cycle of cultivation making inferences about parasite biology in vivo impossible.

      Therefore, – to our opinion – the statement in the last sentence is well supported by the data presented.

      Line 43–47: “Modest but unpredictable var gene switching and convergence towards var2csa were observed in culture, along with differential expression of 19% of the core transcriptome between paired ex vivo and generation 1 samples. Our results cast doubt on the validity of the common practice of using short-term cultured parasites to make inferences about in vivo phenotype and behaviour.” Nevertheless, we would like to note that this study was in a unique position to assess changes at the individual patient level as we had successive parasite generations. This comparison is not done in most cross-sectional studies and therefore these small, unpredictable changes in the var transcriptome are missed.

      Reviewer #2:

      In this study, the authors describe a pipeline to sequence expressed var genes from RNA sequencing that improves on a previous one that they had developed. Importantly, they use this approach to determine how var gene expression changes with short-term culture. Their finding of shifts in the expression of particular var genes is compelling and casts some doubt on the comparability of gene expression in short-term culture versus var expression at the time of participant sampling. The authors appear to overstate the novelty of their pipeline, which should be better situated within the context of existing pipelines described in the literature.

      Other studies have relied on short-term culture to understand var gene expression in clinical malaria studies. This study indicates the need for caution in over-interpreting findings from these studies.

      The novel method of var gene assembly described by the authors needs to be appropriately situated within the context of previous studies. They neglect to mention several recent studies that present transcript-level novel assembly of var genes from clinical samples. It is important for them to situate their work within this context and compare and contrast it accordingly. A table comparing all existing methods in terms of pros and cons would be helpful to evaluate their method.

      We are grateful for this suggestion and agree that a table comparing the pros and cons of all existing methods would be helpful for the general reader and also highlight the key advantages of our new approach. A table comparing previous methods for var gene and transcript characterisation has been added to the manuscript and is referenced in the introduction (line 107).

      Author response table 1.

      Comparison of previous var assembly approaches based on DNA- and RNA-sequencing.

      Reviewer #3:

      This work focuses on the important problem of how to access the highly polymorphic var gene family using short-read sequence data. The approach that was most successful, and utilized for all subsequent analyses, employed a different assembler from their prior pipeline, and impressively, more than doubles the N50 metric.

      The authors then endeavor to utilize these improved assemblies to assess differential RNA expression of ex vivo and short-term cultured samples, and conclude that their results "cast doubt on the validity" of using short-term cultured parasites to infer in vivo characteristics. Readers should be aware that the various approaches to assess differential expression lack statistical clarity and appear to be contradictory. Unfortunately, there is no attempt to describe the rationale for the different approaches and how they might inform one another.

      It is unclear whether adjusting for life-cycle stage as reported is appropriate for the var-only expression models. The methods do not appear to describe what type of correction variable (continuous/categorical) was used in each model, and there is no discussion of the impact on var vs. core transcriptome results.

      We agree with the reviewer that the different methods and results of the var transcriptome analysis can be difficult to reconcile. To address this, we have included a summary table with a brief description of the rationale and results of each approach in our analysis pipeline.

      Author response table 2.

      Summary of the different levels of analysis performed to assess the effect of short-term parasite culturing on var and core gene expression, their rational, method, results, and interpretation.

      Additionally, the var transcript differential expression analysis was removed from the manuscript, because this study was in a unique position to perform a more focused analysis of var transcriptional changes across paired samples, meaning the per-patient approach was more suitable. This allowed for changes in the var transcriptome to be identified that would have gone unnoticed in the traditional differential expression analysis.

      We thank the reviewer for his highly important comment about adjusting for life cycle stage. Var gene expression is highly stage-dependent, so any quantitative comparison between samples does need adjustment for developmental stage. All life cycle stage adjustments were done using the mixture model proportions to be consistent with the original paper, described in the results and methods sections:

      • Line 219–221: “Due to the potential confounding effect of differences in stage distribution on gene expression, we adjusted for developmental stage determined by the mixture model in all subsequent analyses.”

      • Line 722–725: “Var gene expression is highly stage dependent, so any quantitative comparison between samples needs adjustment for developmental stage. The life cycle stage proportions determined from the mixture model approach were used for adjustment.“

      The rank-expression analysis did not have adjustment for life cycle stage as the values were determined as a percentage contribution to the total var transcriptome. The var group level and the global var gene expression analyses were adjusted for life cycle stages, by including them as an independent variable, as described in the results and methods sections.

      Var group expression:

      • Line 321–326: “Due to these results, the expression of group A var genes vs. group B and C var genes was investigated using a paired analysis on all the DBLα (DBLα1 vs DBLα0 and DBLα2) and NTS (NTSA vs NTSB) sequences assembled from ex vivo samples and across multiple generations in culture. A linear model was created with group A expression as the response variable, the generation and life cycle stage as independent variables and the patient information included as a random effect. The same was performed using group B and C expression levels.“

      • Line 784–787: “DESeq2 normalisation was performed, with patient identity and life cycle stage proportions included as covariates and differences in the amounts of var transcripts of group A compared with groups B and C assessed (Love et al., 2014). A similar approach was repeated for NTS domains.”

      Gobal var gene expression:

      • Line 342–347: “A linear model was created (using only paired samples from ex vivo and generation 1) (Supplementary file 1) with proportion of total gene expression dedicated to var gene expression as the response variable, the generation and life cycle stage as independent variables and the patient information included as a random effect. This model showed no significant differences between generations, suggesting that differences observed in the raw data may be a consequence of small changes in developmental stage distribution in culture.”

      • Line 804–806: “Significant differences in total var gene expression were tested by constructing a linear model with the proportion of gene expression dedicated to var gene expression as the response variable, the generation and life cycle stage as an independent variables and the patient identity included as a random effect.“

      The analysis of the conserved var gene expression was adjusted for life cycle stage:

      • Line 766–768: “For each conserved gene, Salmon normalised read counts (adjusted for life cycle stage) were summed and expression compared across the generations using a pairwise Wilcoxon rank test.”

      And life cycle stage estimates were included as covariates in the design matrix for the domain differential expression analysis:

      • Line 771–773: “DESeq2 was used to test for differential domain expression, with five expected read counts in at least three patient isolates required, with life cycle stage and patient identity used as covariates.”

      Reviewer #1:

      1. In the legend to Figure 1, the authors cite "Deitsch and Hviid, 2004" for the classification of different var gene types. This is not the best reference for this work. Better citations would be Kraemer and Smith, Mol Micro, 2003 and Lavstsen et al, Malaria J, 2003.

      We agree and have updated the legend in Figure 1 with these references, consistent with the references cited in the introduction.

      1. In Figures 2 and 3, each of the boxes in the flow charts are largely filled with empty space while the text is nearly too small to read. Adjusting the size of the text would improve legibility.

      We have increased the size of the text in these figures.

      1. My understanding of the computational method for assessing global var gene expression indicates an initial step of identifying reads containing the amino acid sequence LARSFADIG. It is worth noting that VAR2CSA does not contain this motif. Will the pipeline therefore miss expression of this gene, and if so, how does this affect the assessment of global var gene assessment? This seems relevant given that the authors detect increased expression of var2csa during adaptation to culture.

      To address this question, we have added an explanation in the methods section to better explain our analysis. Var2csa was not captured in the global var gene expression analysis, but was analyzed separately because of its unique properties (conservation, proposed role in regulating var gene switching, slightly divergent timing of expression, translational repression).

      • Line 802/3: “Var2csa does not contain the LARSFADIG motif, hence this quantitative analysis of global var gene expression excluded var2csa (which was analysed separately).”
      1. In Figures 4 and 7, panels a and b display virtually identical PCA plots, with the exception that panel A displays more generations. Why are both panels included? There doesn't appear to be any additional information provided by panel B.

      We agree and have removed Figure 7b for the core transcriptome PCA as it did not provide any new information. The var transcript differential analysis (displayed in Figure 4) has been removed from the manuscript.

      1. On line 560-567, the authors state "However, the impact of short-term culture was the most apparent at the var transcript level and became less clear at higher levels." What are the high levels being referred to here?

      We have replaced this sentence to make it clearer what the different levels are (global var gene expression, var domain and var type).

      • Line 526/7: “However, the impact of short-term culture was the most apparent at the var transcript level and became less clear at the var domain, var type and global var gene expression level.”

      Reviewer #2:

      The authors make no mention or assessment of previously published var gene assembly methods from clinical samples that focus on genomic or transcriptomic approaches. These include:

      https://pubmed.ncbi.nlm.nih.gov/28351419/

      https://pubmed.ncbi.nlm.nih.gov/34846163/

      These methods should be compared to the method for var gene assembly outlined by the co-authors, especially as the authors say that their method "overcomes previous limitations and outperforms current methods" (128-129). The second reference above appears to be a method to measure var expression in clinical samples and so should be particularly compared to the approach outlined by the authors.

      Thank you for pointing this out. We have included the second reference in the introduction of our revised manuscript, where we refer to var assembly and quantification from RNA-sequencing data. We abstained from including the first paper in this paragraph (Dara et al., 2017) as it describes a var gene assembly pipeline and not a var transcript assembly pipeline.

      • Line 101–105: “While approaches for var assembly and quantification based on RNA-sequencing have recently been proposed (Wichers et al., 2021; Stucke et al., 2021; Andrade et al., 2020; TonkinHill et al., 2018, Duffy et al., 2016), these still produce inadequate assembly of the biologically important N-terminal domain region, have a relatively high number of misassemblies and do not provide an adequate solution for handling the conserved var variants (Table S1).”

      Additionally, we have updated the manuscript with a table (Table S1) comparing these two methods plus other previously used var transcript/gene assembly approaches (see comment to the public reviews).

      But to address this particular comment in more detail, the first paper (Dara et al., 2017) is a var gene assembly pipeline and not a var transcript assembly pipeline. It is based on assembling var exon 1 from unfished whole genome assemblies of clinical samples and requires a prior step for filtering out human DNA. The authors used two different assemblers, Celera for short reads (which is no longer maintained) and Sprai for long reads (>2000bp), but found that Celera performed worse than Sprai, and subsequently used Sprai assemblies. Therefore, this method does not appear to be suitable for assembling short reads from RNA-seq.

      The second paper (Stucke et al. 2021) focusses more on enriching for parasite RNA, which precedes assembly. The capture method they describe would complement downstream analysis of var transcript assembly with our pipeline. Their assembly pipeline is similar to our pipeline as they also performed de novo assembly on all P. falciparum mapping and non-human mapping reads and used the same assembler (but with different parameters). They clustered sequences using the same approach but at 90% sequence identity as opposed to 99% sequence identity using our approach. Then, Stucke et al. use 500nt as a cut-off as opposed to the more stringent filtering approach used in our approach. They annotated their de novo assembled transcripts with the known amino acid sequences used in their design of the capture array; our approach does not assume prior information on the var transcripts. Finally, their approach was validated only for its ability to recover the most highly expressed var transcript in 6 uncomplicated malaria samples, and they did not assess mis-assemblies in their approach.

      For the methods (619–621), were erythrocytes isolated by Ficoll gradient centrifugation at the time of collection or later?

      We have updated the methods section to clarify this.

      • Line 586–588: “Blood was drawn and either immediately processed (#1, #2, #3, #4, #11, #12, #14, #17, #21, #23, #28, #29, #30, #31, #32) or stored overnight at 4oC until processing (#5, #6, #7, #9, #10, #13, #15, #16, #18, #19, #20, #22, #24, #25, #26, #27, #33).”

      Was the current pipeline and assembly method assessed for var chimeras? This should be described.

      Yes, this was quantified in the Pf 3D7 dataset and also assessed in the German traveler dataset. For the 3D7 dataset it is described in the result section and Figure S1.

      • Line 168–174: “However, we found high accuracies (> 0.95) across all approaches, meaning the sequences we assembled were correct (Figure 2 – Figure supplement 1b). The whole transcript approach also performed the best when assembling the lower expressed var genes (Figure 2 – Figure supplement 1e) and produced the fewest var chimeras compared to the original approach on P. falciparum 3D7. Fourteen misassemblies were observed with the whole transcript approach compared to 19 with the original approach (Table S2). This reduction in misassemblies was particularly apparent in the ring-stage samples.” - Figure S1:

      Author response image 1.

      Performance of novel computational pipelines for var assembly on Plasmodium falciparum 3D7: The three approaches (whole transcript: blue, domain approach: orange, original approach: green) were applied to a public RNA-seq dataset (ENA: PRJEB31535) of the intra-erythrocytic life cycle stages of 3 biological replicates of cultured P. falciparum 3D7, sampled at 8-hour intervals up until 40hrs post infection (bpi) and then at 4-hour intervals up until 48 (Wichers al., 2019). Boxplots show the data from the 3 biological replicates for each time point in the intra-erythrocytic life cycle: a) alignment scores for the dominantly expressed var gene (PF3D7_07126m), b) accuracy scores for the dominantly var gene (PF3D7_0712600), c) number of contigs to assemble the dominant var gene (PF3D7_0712600), d) alignment scores for a middle ranking expressed vargene (PF3D7_0937800), e) alignment scores for the lowest expressed var gene (PF3D7_0200100). The first best blast hit (significance threshold = le-10) was chosen for each contig. The alignment score was used to evaluate the each method. The alignment score represents √accuracy* recovery. The accuracy is the proportion of bases that are correct in the assembled transcript and the recovery reflects what proportion of the true transcript was assembled. Assembly completeness of the dominant vargene (PF3D7 071200, length = 6648nt) for the three approaches was assessed for each biological f) biological replicate 1, g) biological replicate 2, h) biological replicate 3. Dotted lines represent the start and end of the contigs required to assemble the vargene. Red bars represent assembled sequences relative to the dominantly whole vargene sequence, where we know the true sequence (termed “reference transcript”).

      For the ex vivo samples, this has been discussed in the result section and now we also added this information to Table 1.

      • Line 182/3: “Remarkably, with the new whole transcript method, we observed a significant decrease (2 vs 336) in clearly misassembled transcripts with, for example, an N-terminal domain at an internal position.”

      • Table 1:

      Author response table 3.

      Statistics for the different approaches used to assemble the var transcripts. Var assembly approaches were applied to malaria patient ex vivo samples (n=32) from (Wichers et al., 2021) and statistics determined. Given are the total number of assembled var transcripts longer than 500 nt containing at least one significantly annotated var domain, the maximum length of the longest assembled var transcript in nucleotides and the N50 value, respectively. The N50 is defined as the sequence length of the shortest var contig, with all var contigs greater than or equal to this length together accounting for 50% of the total length of concatenated var transcript assemblies. Misassemblies represents the number of misassemblies for each approach. **Number of misassemblies were not determined for the domain approach due to its poor performance in other metrics.

      Line 432: "the core gene transcriptome underwent a greater change relative to the var transcriptome upon transition to culture." Can this be shown statistically? It's unclear whether the difference in the sizes of the respective pools of the core genome and the var genes may account for this observation.

      We found 19% of the core transcriptome to be differentially expressed. The per patient var transcript analysis revealed individually highly variable but generally rather subtle changes in the var transcriptome. The different methods for assessing this make it difficult to statistically compare these two different results.

      The feasibility of this approach for field samples should be discussed in the Discussion.

      In the original manuscript we reflected on this already several times in the discussion (e.g., line 465/6; line 471–475; line 555–568). We now have added another two sentences at the end of the paragraph starting in line 449 to address this point. It reads now:

      • Line 442–451: “Our new approach used the most geographically diverse reference of var gene sequences to date, which improved the identification of reads derived from var transcripts. This is crucial when analysing patient samples with low parasitaemia where var transcripts are hard to assemble due to their low abundancy (Guillochon et al., 2022). Our approach has wide utility due to stable performance on both laboratory-adapted and clinical samples. Concordance in the different var expression profiling approaches (RNA-sequencing and DBLα-tag) on ex vivo samples increased using the new approach by 13%, when compared to the original approach (96% in the whole transcript approach compared to 83% in Wichers et al., 2021. This suggests the new approach provides a more accurate method for characterising var genes, especially in samples collected directly from patients. Ultimately, this will allow a deeper understanding of relationships between var gene expression and clinical manifestations of malaria.”

      MINOR

      The plural form of PfEMP1 (PfEMP1s) is inconsistently used throughout the text.

      Corrected.

      404-405: statistical test for significance?

      Thank you for this suggestion. We have done two comparisons between the original analysis from Wichers et al., 2021 and our new whole transcript approach to test concordance of the RNAseq approaches with the DBLα-tag approach using paired Wilcoxon tests. These comparisons suggest that our new approach has significantly increased concordance with DBLα-tag data and might be better at capturing all expressed DBLα domains than the original analysis (and the DBLα-approach), although not statistically significant. We describe this now in the result section.

      • Line 352–361: “Overall, we found a high agreement between the detected DBLα-tag sequences and the de novo assembled var transcripts. A median of 96% (IQR: 93–100%) of all unique DBLα-tag sequences detected with >10 reads were found in the RNA-sequencing approach. This is a significant improvement on the original approach (p= 0.0077, paired Wilcoxon test), in which a median of 83% (IQR: 79–96%) was found (Wichers et al., 2021). To allow for a fair comparison of the >10 reads threshold used in the DBLα-tag approach, the upper 75th percentile of the RNA-sequencingassembled DBLα domains were analysed. A median of 77.4% (IQR: 61–88%) of the upper 75th percentile of the assembled DBLα domains were found in the DBLα-tag approach. This is a lower median percentage than the median of 81.3% (IQR: 73–98%) found in the original analysis (p= 0.28, paired Wilcoxon test) and suggests the new assembly approach is better at capturing all expressed DBLα domains.”

      Figure 4: The letters for the figure panels need to be added.

      The figure has been removed from the manuscript.

      Reviewer #3:

      It is difficult from Table S2 to determine how many unique var transcripts would have enough coverage to be potentially assembled from each sample. It seems unlikely that 455 distinct vars (~14 per sample) would be expressed at a detectable level for assembly. Why not DNA-sequence these samples to get the full repertoire for comparison to RNA? Why would so many distinct transcripts be yielded from fairly synchronous samples?

      We know from controlled human malaria infections of malaria-naive volunteers, that most var genes present in the genomic repertoire of the parasite strain are expressed at the onset of the human blood phase (heterogenous var gene expression) (Wang et al., 2009; Bachmann et al, 2016; Wichers-Misterek et al., 2023). This pattern shifts to a more restricted, homogeneous var expression pattern in semi-immune individuals (expression of few variants) depending on the degree of immunity (Bachmann et al., 2019).

      Author response image 2.

      In this cohort, 15 first-time infections are included, which should also possess a more heterogenous var gene expression in comparison to the pre-exposed individuals, and indeed such a trend is already seen in the number of different DBLa-tag clusters found in both patient groups (see figure panel from Wichers et al. 2021: blue-first-time infections; grey–pre-exposed). Moreover, Warimwe et al. 2013 have shown that asymptomatic infections have a more homogeneous var expression in comparison to symptomatic infections. Therefore, we expect that parasites from symptomatic infections have a heterogenous var expression pattern with multiple var gene variants expressed, which we could assemble due to our high read depth and our improved var assembly pipeline for even low expressed variants.

      Moreover, the distinct transcripts found in the RNA-seq approach were confirmed with the DBLα tag data. To our opinion, previous approaches may have underestimated the complexity of the var transcriptome in less immune individuals.

      Mapping reads to these 455 putative transcripts and using this count matrix for differential expression analysis seems very unlikely to produce reliable results. As acknowledged on line 327, many reads will be mis-mapped, and perhaps most challenging is that most vars will not be represented in most samples. In other words, even if mapping were somehow perfect, one would expect a sparse matrix that would not be suitable for statistical comparisons between groups. This is likely why the per-patient transcript analysis doesn't appear to be consistent. I would recommend the authors remove the DE sections utilizing this approach, or add convincing evidence that the count matrix is useable.

      We agree that this is a general issue of var differential expression analysis. Therefore, we have removed the var differential expression analysis from this manuscript as the per patient approach was more appropriate for the paired samples. We validated different mapping strategies (new Figure S6) and included a paragraph discussing the problem in the result section:

      • Line 237–255: “In the original approach of Wichers et al., 2021, the non-core reads of each sample used for var assembly were mapped against a pooled reference of assembled var transcripts from all samples, as a preliminary step towards differential var transcript expression analysis. This approach returned a small number of var transcripts which were expressed across multiple patient samples (Figure 3 – Figure supplement 2a). As genome sequencing was not available, it was not possible to know whether there was truly overlap in var genomic repertoires of the different patient samples, but substantial overlap was not expected. Stricter mapping approaches (for example, excluding transcripts shorter than 1500nt) changed the resulting var expression profiles and produced more realistic scenarios where similar var expression profiles were generated across paired samples, whilst there was decreasing overlap across different patient samples (Figure 3 – Figure supplement 2b,c). Given this limitation, we used the paired samples to analyse var gene expression at an individual subject level, where we confirmed the MSP1 genotypes and alleles were still present after short-term in vitro cultivation. The per patient approach showed consistent expression of var transcripts within samples from each patient but no overlap of var expression profiles across different patients (Figure 3 – Figure supplement 2d). Taken together, the per patient approach was better suited for assessing var transcriptional changes in longitudinal samples. It has been hypothesised that more conserved var genes in field isolates increase parasite fitness during chronic infections, necessitating the need to correctly identify them (Dimonte et al., 2020, Otto et al., 2019). Accordingly, further work is needed to optimise the pooled sample approach to identify truly conserved var transcripts across different parasite isolates in cross-sectional studies.” - Figure S6:

      Author response image 3.

      Var expression profiles across different mapping. Different mapping approaches Were used to quantify the Var expression profiles of each sample (ex Vivo (n=13), generation I (n=13), generation 2 (n=10) and generation 3 (n=l). The pooled sample approach in Which all significantly assembled van transcripts (1500nt and containing3 significantly annotated var domains) across samples were combined into a reference and redundancy was removed using cd-hit (at sequence identity = 99%) (a—c). The non-core reads of each sample were mapped to this pooled reference using a) Salmon, b) bowtie2 filtering for uniquely mapping paired reads with MAPQ and c) bowtie2 filtering for uniquely mapping paired reads with a MAPQ > 20. d) The per patient approach was applied. For each patient, the paired ex vivo and in vitro samples were analysed. The assembled var transcripts (at least 1500nt and containing3 significantly annotated var domains) across all the generations for a patient were combined into a reference, redundancy was removed using cd-hit (at sequence identity: 99%), and expression was quantified using Salmon. Pie charts show the var expression profile With the relative size of each slice representing the relative percentage of total var gene expression of each var transcript. Different colours represent different assembled var transcripts with the same colour code used across a-d.

      For future cross-sectional studies a per patient analysis that attempts to group per patient assemblies on some unifying structure (e.g., domain, homology blocks, domain cassettes etc) should be performed.

      Line 304. I don't understand the rationale for comparing naïve vs. prior-exposed individuals at ex-vivo and gen 1 timepoints to provide insights into how reliable cultured parasites are as a surrogate for var expression in vivo. Further, the next section (per patient) appears to confirm the significant limitation of the 'all sample analysis' approach. The conclusion on line 319 is not supported by the results reported in figures S9a and S9b, nor is the bold conclusion in the abstract about "casting doubt" on experiments utilizing culture adapted

      We have removed this comparison from the manuscript due to the inconsistencies with the var per patient approach. However, the conclusion in the abstract has been rephrased to reflect the fact we observed 19% of the core transcript differentially expressed within one cycle of cultivation.

      Line 372/391 (and for the other LMM descriptions). I believe you mean to say response variable, rather than explanatory variable. Explanatory variables are on the right hand side of the equation.

      Thank you for spotting this inaccuracy, we changed it to “response variable” (line 324, line 343, line 805).

      Line 467. Similar to line 304, why would comparisons of naïve vs. prior-exposed be informative about surrogates for in vivo studies? Without a gold-standard for what should be differentially expressed between naïve and prior-exposed in vivo, it doesn't seem prudent to interpret a drop in the number of DE genes for this comparison in generation 1 as evidence that biological signal for this comparison is lost. What if the generation 1 result is actually more reflective of the true difference in vivo, but the ex vivo samples are just noisy? How do we know? Why not just compare ex vivo vs generation 1/2 directly (as done in the first DE analysis), and then you can comment on the large number of changes as samples are less and less proximal to in vivo?

      In the original paper (Wichers et al., 2021), there were differences between the core transcriptome of naïve vs previously exposed patients. However, these differences appeared to diminish in vitro, suggesting the in vivo core transcriptome is not fully maintained in vitro.

      We have added a sentence explaining the reasoning behind this analysis in the results section:

      • Lines 414–423: “In the original analysis of ex vivo samples, hundreds of core genes were identified as significantly differentially expressed between pre-exposed and naïve malaria patients. We investigated whether these differences persisted after in vitro cultivation. We performed differential expression analysis comparing parasite isolates from naïve (n=6) vs pre-exposed (n=7) patients, first between their ex vivo samples, and then between the corresponding generation 1 samples. Interestingly, when using the ex vivo samples, we observed 206 core genes significantly upregulated in naïve patients compared to pre-exposed patients (Figure 7 – Figure supplement 3a). Conversely, we observed no differentially expressed genes in the naïve vs pre-exposed analysis of the paired generation 1 samples (Figure 7 – Figure supplement 3b). Taken together with the preceding findings, this suggests one cycle of cultivation shifts the core transcriptomes of parasites to be more alike each other, diminishing inferences about parasite biology in vivo.”

      Overall, I found the many DE approaches very frustrating to interpret coherently. If not dropped in revision, the reader would benefit from a substantial effort to clarify the rationale for each approach, and how each result fits together with the other approaches and builds to a concise conclusion.

      We agree that the manuscript contains many different complex layers of analysis and that it is therefore important to explain the rationale for each approach. Therefore, we now included the summary Table 3 (see comment to public review). Additionally, we have removed the var transcript differential expression due to its limitations, which we hope has already streamlined our manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewers for their in-depth consideration of our manuscript and their helpful reviews. Their efforts have made the paper much better. We have responded to each point. The previously provided public responses have been updated they are included after the private response for convenience.

      Reviewer #1 (Recommendations For The Authors):

      1. In general, the manuscript will benefit from copy editing and proof reading. Some obvious edits;

      2. Page 6 line 140. Do the authors mean Cholera toxin B?

      Response: We corrected this error and went through the entire paper carefully correcting for grammar and increased clarity.

      • Page 8 line 173. Methylbetacyclodextrin is misspelled.

      Response: Yes, corrected.

      • Figure 4c is missing representative traces for electrophysiology data.

      • Figure 4. Please check labeling ordering in figure legend as it does not match the panels in the figure.

      Thank you for the correction and we apologize for the confusion in figure 4. We uploaded an incomplete figure legend, and the old panel ‘e’ was not from an experiment that was still in the figure. It was removed and the figure legends are now corrected.

      • Please mention the statistical analysis used in all figure legends.

      Response: Thank you for pointing out this omission, statistics have been added.

      • Although the schematics in each figure helps guide readers, they are very inconsistent and sometimes confusing. For example, in Figure 5 the gating model is far-reaching without conclusive evidence, whereas in Figure 6 it is over simplified and unclear what the image is truly representing (granted that the downstream signaling mechanism and channel is not known).

      Response: Figure 5d is the summary figure for the entire paper. We have made this clearer in the figure legend and we deleted the title above the figure that gave the appearance that the panel relates to swell only. It is the proposed model based on what we show in the paper and what is known about the activation mechanism of TREK-1.

      Figure 6 is supposed to be simple. It is to help the reader understand that when PA is low mechanical sensitivity is high. Without the graphic, previous reviewers got confused about threshold going down and mechanosensitivity going up and how the levels of PA relate. Low PA= high sensitivity. We’ve added a downstream effector to the right side of the panel to avoid any biased to a putative downstream channel effector. The purpose of the experiment is to show PLD has a mechanosensitive phenotype in vivo.

      Reviewer #2 (Recommendations For The Authors):

      This manuscript outlines some really interesting findings demonstrating a mechanism by which mechanically driven alterations in molecular distributions can influence a) the activity of the PLD2 molecule and subsequently b) the activation of TREK-1 when mechanical inputs are applied to a cell or cell membrane.

      The results presented here suggest that this redistribution of molecules represents a modulatory mechanism that alters either the amplitude or the sensitivity of TREK-1 mediated currents evoked by membrane stretch. While the authors do present values for the pressure required to activate 50% of channels (P50), the data presented provides incomplete evidence to conclude a shift in threshold of the currents, given that many of the current traces provided in the supplemental material do not saturate within the stimulus range, thus limiting the application of a Boltzmann fit to determine the P50. I suggest adding additional context to enable readers to better assess the limitations of this use of the Boltzmann fit to generate a P50, or alternately repeating the experiments to apply stimuli up to lytic pressures to saturate the mechanically evoked currents, enabling use of the Boltzmann function to fit the data.

      Response: We thank the reviewer for pointing this out. We agree the currents did not reach saturation. Hence the term P50 could be misleading, so we have removed it from the paper. We now say “half maximal” current measured from non-saturating pressures of 0-60 mmHg. We also deleted the xPLD data in supplemental figure 3C since there is insufficient current to realistically estimate a half maximal response.

      In my opinion, the conclusions presented in this manuscript would be strengthened by an assessment of the amount of TREK-1 in the plasma membrane pre and post application of shear. While the authors do present imaging data in the supplementary materials, these data are insufficiently precise to comment on expression levels in the membrane. To strengthen this conclusion the authors could conduct cell surface biotinylation assays, as a more sensitive and quantitative measure of membrane localisation of the proteins of interest.

      1. Response: as mentioned previously, we do not have an antibody to the extracellular domain. Nonetheless to better address this concern we directly compared the levels of TREK-1, PIP2, and GM1; in xPLD2, mPLD2, enPLD2 with and without shear. The results are in supplemental figure 2. PLD2 is known to increase endocytosis1 and xPLD2 is known to block both agonist induced and constitutive endocytosis of µ-opioid receptor2. The receptor is trapped on the surface. This is true of many proteins including Rho3, ARF4, and ACE21 among others. In agreement with this mechanism, in Figure S2C,G we show that TREK increases with xPLD and the localization can clearly be seen at the plasma membrane just like in all of the other publications with xPLD overexpression. xPLD2 would be expected to inhibit the basal current but we presume the increased expression likely has compensated and there is sufficient PA and PG from other sources to allow for the basal current. It is in this state that we then conduct our ephys and monitor with a millisecond time resolution and see no activation. We are deriving conclusion from a very clear response—Figure 1b shows almost no current, even at 1-10 ms after applying pressure. There is little pressure current when we know the channel is present and capable of conducting ion (Figure 1d red bar). After shear there is a strong decrease in TREK-1 currents on the membrane in the presence of xPLD2. But it is not less than TREK-1 expression with mPLD2. And since mouse PLD2 has the highest basal current and pressure activation current. The amount of TREK-1 present is sufficient to conduct large current. To have almost no detective current would require at least a 10 fold reduction compared to mPLD2 levels before we would lack the sensitivity to see a channel open. Lasty endocytosis typically in on the order of seconds to minutes, no milliseconds.

      2. We have shown an addition 2 independent ways that TREK-1 is on the membrane during our stretch experiments. Figure 1d shows the current immediately prior to applying pressure for wt TREK-1. When catalytically dead PLD is present (xPLD2) there is almost normal basal current. The channel is clearly present. And then in figure 1a we show within a millisecond there is no pressure current. As a control we added a functionally dead TREK-1 truncation (xTREK). Compared to xPLD2 there is clearly normal basal current. If this is not strong evidence the channel was available on the surface for mechanical activation please help us understand why. And if you think within 2.1 ms 100% of the channel is gone by endocytosis please provide some evidence that this is possible so we can reconsider.

      3. We have TIRF super resolution imaging with ~20 nm x-y resolution and ~ 100nm z resolution and Figure 2b clearly shows the channel on the membrane. When we apply pressure in 1b, the channel is present.

      4. Lastly, In our previous studies we showed activation of PLD2 by anesthetics was responsible for all of TREK-1’s anesthetic sensitivity and this was through PLD2 binding to the C-terminus of TREK-15. We showed this was the case by transferring anesthetic sensitivity to an anesthetic insensitive homolog TRAAK. This established conclusively the basic premise of our mechanism. Here we show the same C-terminal region and PLD2 are responsible for the mechanical current observed by TREK-1. TRAAK is already mechanosensitive so the same chimera will not work for our purposes here. But anesthetic activation and mechanical activation are dramatically different stimuli, and the fact that the role of PLD is robustly observed in both should be considered.

      The authors discuss that the endogenous levels of TREK-1 and PLD2 are "well correlated: in C2C12 cells, that TREK-1 displayed little pair correlation with GM1 and that a "small amount of TREK-1 trafficked to PIP2". As such, these data suggest that the data outlined for HEK293T cells may be hampered by artefacts arising from overexpression. Can TREK-1 currents be activated by membrane stretch in these cells C2C12 cells and are they negatively impacted by the presence of xPLD2? Answering this question would provide more insight into the proposed mechanism of action of PLD2 outlined by the authors in this manuscript. If no differences are noted, the model would be called into question. It could be that there are additional cell-specific factors that further regulate this process.

      Response: The low pair correlation of TREK-1 and GM1 in C2C12 cells was due to insufficient levels of cholesterol in the cell membrane to allow for robust domain formation. In Figure 4b we loaded C2C12 cells with cholesterol using the endogenous cholesterol transport protein apoE and serum (an endogenous source of cholesterol). As can be seen in Fig. 4b, the pair correlation dramatically increased (purple line). This was also true in neuronal cells (N2a) (Fig 4d, purple bar). And shear (3 dynes/cm2) caused the TREK-1 that was in the GM1 domains to leave (red bar) reversing the effect of high cholesterol. This demonstrates our proposed mechanism is working as we expect with endogenously expressed proteins.

      There are many channels in C2C12 cells, it would be difficult to isolate TREK-1 currents, which is why we replicated the entire system (ephys and dSTORM) in HEK cells. Note, in figure 4c we also show that adding cholesterol inhibits TREK-1 whole cell currents in HEK293cells.

      As mentioned in the public review, the behavioural experiments in D. melanogaster can not solely be attributed to a change in threshold. While there may be a change in the threshold to drive a different behaviour, the writing is insufficiently precise to make clear that conclusions cannot be drawn from these experiments regarding the functional underpinnings of this outcome. Are there changes in resting membrane potential in the mutant flys? Alterations in Nav activity? Without controlling for these alternate explanations it is difficult to see what this last piece of data adds to the manuscript, particularly given the lack of TREK-1 in this organism. At the very least, some editing of the text to more clearly indicate that these data can only be used to draw conclusions on the change in threshold for driving the behaviour not the change in threshold of the actual mechanotransduction event (i.e. conversion of the mechanical stimulus into an electrochemical signal).

      Response: We agree; features other than PLDs direct mechanosensitivity are likely contributing. This was shown in figure 6g left side. We have an arrow going to ion channel and to other downstream effectors. We’ve added the putative alteration to downstream effectors to the right side of the panel. This should make it clear that we no more speculate the involvement of a channel than any of the other many potential downstream effectors. As mentioned above, the figure helps the reader coordinate low PA with increased mechanosensitivity. Without the graphic reviewers got confused that PA increased the threshold which corresponds to a decreased sensitivity to pain. Nonetheless we removed our conclusion about fly thresholds from the abstract and made clearer in the main text the lack of mechanism downstream of PLD in flies including endocytosis. Supplemental Figure S2H also helps emphasize this. .

      Nav channels are interesting, and since PLD contribute to endocytosis and Nav channels are also regulated by endocytosis there is likely a PLD specific effect using Nav channels. There are many ways PA likely regulates mechanosensitive thresholds, but we feel Nav is beyond the scope of our paper. Someone else will need to do those studies. We have amended a paragraph in the conclusion which clearly states we do not know the specific mechanism at work here with the suggestions for future research to discover the role of lipid and lipid-modifying enzymes in mechanosensitive neurons.

      There may be fundamental flaws in how the statistics have been conducted. The methods section indicates that all statistical testing was performed with a Student's t-test. A visual scan of many of the data sets in the figures suggests that they are not normally distributed, thus a parametric test such as a Student's t-test is not valid. The authors should assess if each data set is normally distributed, and if not, a non-parametric statistical test should be applied. I recommend assessing the robustness of the statistical analyses and adjusting as necessary.

      Response: We thank the reviewer for pointing this out, indeed there is some asymmetry in Figure 6C-d. The p values with Mann Whitney were slightly improved p=0.016 and p=0.0022 for 6c and 6d respectively. For reference, the students t-test had slightly worse statistics p=0.040 and p=0.0023. The score remained the same 1 and 2 stars respectively.

      The references provided for the statement regarding cascade activation of the TRPs are incredibly out of date. While it is clear that TRPV4 can be activated by a second messenger cascade downstream of osmotic swelling of cells, TRPV4 has also been shown to be activated by mechanical inputs at the cell-substrate interface, even when the second messenger cascade is inhibited. Recommend updating the references to reflect more current understanding of channel activation.

      Response: We thank the reviewer for pointing this out. We have updated the references and changed the comment to “can be” instead of “are”. The reference is more general to multiple ion channel types including KCNQ4. This should avoid any perceived conflict with the cellsubstrate interface mechanism which we very much agree is a correct mechanism for TRP channels.

      Minor comments re text editing etc:

      The central messages of the manuscript would benefit from extensive work to increase the precision of the writing of the manuscript and the presentation of data in the figures, such textual changes alone would help address a number of the concerns outlined in this review, by clarifying some ambiguities. There are numerous errors throughout, ranging from grammatical issues, ambiguities with definitions, lack of scale bars in images, lack of labels on graph axes, lack of clarity due to the mode of presentation of sample numbers (it would be far more precise to indicate specific numbers for each sample rather than a range, which is ambiguous and confusing), unnecessary and repeat information in the methods section. Below are some examples but this list is not exhaustive.

      Response: Thank you, reviewer # 1 also had many of these concerns. We have gone through the entire paper and improved the precision of the writing of the manuscript. We have also added the missing error bar to Figure 6. And axis labels have been added to the inset images. The redundancy in cell culture methods has been removed. Where a range is small and there are lots of values, the exact number of ‘n’ are graphically displayed in the dot plot for each condition.

      Text:

      I recommend considering how to discuss the various aspects of channel activation. A convention in the field is to use mechanical activation or mechanical gating to describe that process where the mechanical stimulus is directly coupled to the channel gating mechanism. This would be the case for the activation of TREK-1 by membrane stretch alone. The increase in activation by PLD2 activity then reflects a modulation of the mechanical activation of the channel, because the relevant gating stimulus is PA, rather than force/stretch. The sum of these events could be described as shear-evoked or mechanically-evoked, TREK-1 mediated currents (thus making it clear that the mechanical stimulus initiates the relevant cascade, but the gating stimulus may be other than direct mechanical input.) Given the interesting and compelling data offered in this manuscript regarding the sensitisation of TREK-1 dependent mechanicallyevoked currents by PLD2, an increase in the precision of the language would help convey the central message of this work.

      Response; We agree there needs to be convention. We have taken the suggestion of mechanically evoked and we suggest the following definitions:

      1. Mechanical activation of PLD2: direct force on the lipids releasing PLD2 from nonactivating lipids.

      2. Mechanical activation/gating of TREK1: direct force from lipids from either tension or hydrophobic mismatch that opens the channel.

      3. Mechanically evoked: a mechanical event that leads to a downstream effect. The effect is mechanically “evoked”.

      4. Spatial patterning/biochemistry: nanoscopic changes in the association of a protein with a nanoscopic lipid cluster or compartment.

      An example of where discussion of mechanical activation is ambiguous in the text is found at line 109: "channel could be mechanically activated by a movement from GM1 to PIP2 lipids." In this case, the sentence could be suggesting that the movement between lipids provides the mechanical input that activates the channel, which is not what the data suggest.

      Response: Were possible we have replaced “movement” with “spatial patterning” and “association” and “dissociation” from specific lipid compartment. This better reflects the data we have in this paper. However, we do think that a movement mechanically activates the channel, GM1 lipids are thick and PIP2 lipids are thin, so movement between the lipids could activate the channel through direct lipid interaction. We will address this aspect in a future paper.

      Inconsistencies with usage:

      • TREK1 versus TREK-1

      Response: corrected to TREK-1

      • mPLD2 versus PLD2

      Response: where PLD2 represents mouse this has been corrected.

      • K758R versus xPLD2

      Response: we replaced K758R in the methods with xPLD2.

      • HEK293T versus HEK293t Response: we have changed all instances to read HEK293T.

      • Drosophila melanogaster and D. melanogaster used inconsistently and in many places incorrectly

      Response: we have read all to read the common name Drosophila.

      Line 173: misspelled methylbetacyclodextrin

      Response corrected

      Line 174: degree symbol missing

      Response corrected

      Line 287: "the decrease in cholesterol likely evolved to further decrease the palmate order in the palmitate binding site"... no evidence, no support for this statement, falsely attributes intention to evolutionary processes .

      Response: we have removed the reference to evolution at the request of the reviewer, it is not necessary. But we do wish to note that to our knowledge, all biological function is scientifically attributed to evolution. The fact that cholesterol decreases in response to shear is evidence alone that the cell evolved to do it.

      Line 307: grammatical error

      Response: the redundant Lipid removed.

      Line 319: overinterpreted - how is the mechanosensitivy of GPCRs explained by this translocation?

      Response: all G-alpha subunits of the GPCR complex are palmitoylated. We showed PLD (which has the same lipidation) is mechanically activated. If the palmitate site is disrupted for PLD2, then it is likely disrupted for every G-alpha subunit as well.

      Line 582: what is the wild type referred to here?

      Response: human full length with a GFP tag.

      Methods:

      • Sincere apologies if I missed something but I do not recall seeing any experiments using purified TREK-1 or flux assays. These details should be removed from the methods section

      Response: Removed.

      • There is significant duplication of detail across the methods (three separate instances of electrophysiology details) these could definitely be consolidated.

      Response: Duplicates removed.

      Figures:

      • Figure 2- b box doesn't correspond to inset. Bottom panel should provide overview image for the cell that was assessed with shear. In bottom panel, circle outlines an empty space.

      Response: We have widened the box slightly to correspond so the non shear box corresponds to the middle panel. We have also added the picture for the whole cell to Fig S2g and outlined the zoom shown in the bottom panel of Fig 2b as requested. The figure is of the top of a cell. We also added the whole cell image of a second sheared cell.

      Author response image 1.

      • Figure 3 b+c: inset graph lacking axis labels

      Response; the inset y axis is the same as the main axis. We added “pair corr. (5nM)” and a description in the figure legend to make this clearer. The purpose of the inset is to show statistical significance at a single point. The contrast has been maximized but without zooming in points can be difficult to see.

      • Figure 5: replicate numbers missing and individual data points lacking in panels b + c, no labels of curve in b + c, insets, unclear what (5 nm) refers to in insets.

      Response: Thank you for pointing out these errors. The N values have been added. Similar to figure 3, the inset is a bar graph of the pair correlation data at 5 nm. A better explanation of the data has been added to the figure legend.

      • Figure 6: no scale bar, no clear membrane localization evident from images presented, panel g offers virtually nothing in terms of insight

      Response: We have added scale bars to figure 6b. Figure 6g is intentionally simplistic, we found that correlating decreased threshold with increased pain was confusing. A previous reviewer claimed our data was inconsistent. The graphic avoids this confusion. We also added negative effects of low PA on downstream effects to the right panel. This helps graphically show we don’t know the downstream effects.

      Reviewer #3 (Recommendations For The Authors):

      Minor suggestions:

      1. line 162, change 'heat' to 'temperature'.

      Response: changed.

      1. in figure 1, it would be helpful to keep the unit for current density consistent among different panels. 1e is a bit confusing: isn't the point of Figure 1 that most of TREK1 activation is not caused by direct force-sensing?

      Response: Yes, the point of figure 1 is to show that in a biological membrane over expressed TREK-1 is a downstream effector of PLD2 mechanosensation which is indirect. We agree the figure legend in the previous version of the paper is very confusing.

      There is almost no PLD2 independent current in our over expressed system, which is represented by no ions in the conduction pathway of the channel despite there being tension on the membrane.

      Purified TREK-1 is only mechanosensitive in a few select lipids, primarily crude Soy PC. It was always assumed that HEK293 and Cos cells had the correct lipids since over expressed TREK-1 responded to mechanical force in these lipids. But that does not appear to be correct, or at least only a small amount of TREK-1 is in the mechanosensitive lipids. Figure 1e graphically shows this. The arrows indicate tension, but the channel isn’t open with xPLD2 present. We added a few sentences to the discussion to further clarify.

      Panels c has different units because the area of the tip was measured whereas in d the resistance of the tip was measured. They are different ways for normalizing for small differences in tip size.

      1. line 178, ~45 of what?

      Response: Cells were fixed for ~30 sec.

      1. line 219 should be Figure 4f?

      Response: thank you, yes Figure 4f.

      Previous public reviews with minor updates.

      Reviewer #1 (Public Review):

      Force sensing and gating mechanisms of the mechanically activated ion channels is an area of broad interest in the field of mechanotransduction. These channels perform important biological functions by converting mechanical force into electrical signals. To understand their underlying physiological processes, it is important to determine gating mechanisms, especially those mediated by lipids. The authors in this manuscript describe a mechanism for mechanically induced activation of TREK-1 (TWIK-related K+ channel. They propose that force induced disruption of ganglioside (GM1) and cholesterol causes relocation of TREK-1 associated with phospholipase D2 (PLD2) to 4,5-bisphosphate (PIP2) clusters, where PLD2 catalytic activity produces phosphatidic acid that can activate the channel. To test their hypothesis, they use dSTORM to measure TREK-1 and PLD2 colocalization with either GM1 or PIP2. They find that shear stress decreases TREK-1/PLD2 colocalization with GM1 and relocates to cluster with PIP2. These movements are affected by TREK-1 C-terminal or PLD2 mutations suggesting that the interaction is important for channel re-location. The authors then draw a correlation to cholesterol suggesting that TREK-1 movement is cholesterol dependent. It is important to note that this is not the only method of channel activation and that one not involving PLD2 also exists. Overall, the authors conclude that force is sensed by ordered lipids and PLD2 associates with TREK-1 to selectively gate the channel. Although the proposed mechanism is solid, some concerns remain.

      1) Most conclusions in the paper heavily depend on the dSTORM data. But the images provided lack resolution. This makes it difficult for the readers to assess the representative images.

      Response: The images were provided are at 300 dpi. Perhaps the reviewer is referring to contrast in Figure 2? We are happy to increase the contrast or resolution.

      As a side note, we feel the main conclusion of the paper, mechanical activation of TREK-1 through PLD2, depended primarily on the electrophysiology in Figure 1b-c, not the dSTORM. But both complement each other.

      2) The experiments in Figure 6 are a bit puzzling. The entire premise of the paper is to establish gating mechanism of TREK-1 mediated by PLD2; however, the motivation behind using flies, which do not express TREK-1 is puzzling.

      Response: The fly experiment shows that PLD mechanosensitivity is more evolutionarily conserved than TREK-1 mechanosensitivity. We have added this observation to the paper.

      -Figure 6B, the image is too blown out and looks over saturated. Unclear whether the resolution in subcellular localization is obvious or not.

      Response: Figure 6B is a confocal image, it is not dSTORM. There is no dSTORM in Figure 6. We have added the error bars to make this more obvious. For reference, only a few cells would fit in the field of view with dSTORM.

      -Figure 6C-D, the differences in activity threshold is 1 or less than 1g. Is this physiologically relevant? How does this compare to other conditions in flies that can affect mechanosensitivity, for example?

      Response: Yes, 1g is physiologically relevant. It is almost the force needed to wake a fly from sleep (1.2-3.2g). See ref 33. Murphy Nature Pro. 2017.

      3) 70mOsm is a high degree of osmotic stress. How confident are the authors that a cell health is maintained under this condition and b. this does indeed induce membrane stretch? For example, does this stimulation activate TREK-1?

      Response: Yes, osmotic swell activates TREK1. This was shown in ref 19 (Patel et al 1998). We agree the 70 mOsm is a high degree of stress. This needs to be stated better in the paper.

      Reviewer #2 (Public Review):

      This manuscript by Petersen and colleagues investigates the mechanistic underpinnings of activation of the ion channel TREK-1 by mechanical inputs (fluid shear or membrane stretch) applied to cells. Using a combination of super-resolution microticopy, pair correlation analysis and electrophysiology, the authors show that the application of shear to a cell can lead to changes in the distribution of TREK-1 and the enzyme PhospholipaseD2 (PLD2), relative to lipid domains defined by either GM1 or PIP2. The activation of TREK-1 by mechanical stimuli was shown to be sensi>zed by the presence of PLD2, but not a catalytically dead xPLD2 mutant. In addition, the activity of PLD2 is increased when the molecule is more associated with PIP2, rather than GM1 defined lipid domains. The presented data do not exclude direct mechanical activation of TREK-1, rather suggest a modulation of TREK-1 activity, increasing sensitivity to mechanical inputs, through an inherent mechanosensitivity of PLD2 activity. The authors additionally claim that PLD2 can regulate transduction thresholds in vivo using Drosophila melanogaster behavioural assays. However, this section of the manuscript overstates the experimental findings, given that it is unclear how the disruption of PLD2 is leading to behavioural changes, given the lack of a TREK-1 homologue in this organism and the lack of supporting data on molecular function in the relevant cells.

      Response: We agree, the downstream effectors of PLD2 mechanosensitivity are not known in the fly. Other anionic lipids have been shown to mediate pain see ref 46 and 47. We do not wish to make any claim beyond PLD2 being an in vivo contributor to a fly’s response to mechanical force. We have removed the speculative conclusions about fly thresholds from the abstract.

      That said we do believe we have established a molecular function at the cellular level. We showed PLD is robustly mechanically activated in a cultured fly cell line (BG2-c2) Figure 6a of the manuscript. And our previous publication established mechanosensation of PLD (Petersen et. al. Nature Com 2016) through mechanical disruption of the lipids. At a minimum, the experiments show PLDs mechanosensitivity is evolutionarily better conserved across species than TREK1.

      This work will be of interest to the growing community of scientists investigating the myriad mechanisms that can tune mechanical sensitivity of cells, providing valuable insight into the role of functional PLD2 in sensi>zing TREK-1 activation in response to mechanical inputs, in some cellular systems.

      The authors convincingly demonstrate that, post application of shear, an alteration in the distribution of TREK-1 and mPLD2 (in HEK293T cells) from being correlated with GM1 defined domains (no shear) to increased correlation with PIP2 defined membrane domains (post shear). These data were generated using super-resolution microticopy to visualise, at sub diffraction resolution, the localisation of labelled protein, compared to labelled lipids. The use of super-resolution imaging enabled the authors to visualise changes in cluster association that would not have been achievable with diffraction limited microticopy. However, the conclusion that this change in association reflects TREK-1 leaving one cluster and moving to another overinterprets these data, as the data were generated from sta>c measurements of fixed cells, rather than dynamic measurements capturing molecular movements.

      When assessing molecular distribution of endogenous TREK-1 and PLD2, these molecules are described as "well correlated: in C2C12 cells" however it is challenging to assess what "well correlated" means, precisely in this context. This limitation is compounded by the conclusion that TREK-1 displayed little pair correlation with GM1 and the authors describe a "small amount of TREK-1 trafficked to PIP2". As such, these data may suggest that the findings outlined for HEK293T cells may be influenced by artefacts arising from overexpression.

      The changes in TREK-1 sensitivity to mechanical activation could also reflect changes in the amount of TREK-1 in the plasma membrane. The authors suggest that the presence of a leak currently accounts for the presence of TREK-1 in the plasma membrane, however they do not account for whether there are significant changes in the membrane localisation of the channel in the presence of mPLD2 versus xPLD2. The supplementary data provide some images of fluorescently labelled TREK-1 in cells, and the authors state that truncating the c-terminus has no effect on expression at the plasma membrane, however these data provide inadequate support for this conclusion. In addition, the data reporting the P50 should be noted with caution, given the lack of saturation of the current in response to the stimulus range.

      Response: We thank the reviewer for his/her concern about expression levels. We did test TREK-1 expression. mPLD decreases TREK-1 expression ~two-fold (see Author response image 2 below). We did not include the mPLD data since TREK-1 was mechanically activated with mPLD. For expression to account for the loss of TREK-1 stretch current (Figure 1b), xPLD would need to block surface expression of TREK-1 prior to stretch. The opposite was true, xPLD2 increased TREK-1 expression (see Figure S2c). Furthermore, we tested the leak current of TREK-1 at 0 mV and 0 mmHg of stretch. Basal leak current was no different with xPLD2 compared to endogenous PLD (Figure 1d; red vs grey bars respectively) suggesting TREK-1 is in the membrane and active when xPLD2 is present. If anything, the magnitude of the effect with xPLD would be larger if the expression levels were equal.

      Author response image 2.

      TREK expression at the plasma membrane. TREK-1 Fluorescence was measured by GFP at points along the plasma membrane. Over expression of mouse PLD2 (mPLD) decrease the amount of full-length TREK-1 (FL TREK) on the surface more than 2-fold compared to endogenously expressed PLD (enPLD) or truncated TREK (TREKtrunc) which is missing the PLD binding site in the C-terminus. Over expression of mPLD had no effect on TREKtrunc.

      Finally, by manipulating PLD2 in D. melanogaster, the authors show changes in behaviour when larvae are exposed to either mechanical or electrical inputs. The depletion of PLD2 is concluded to lead to a reduction in activation thresholds and to suggest an in vivo role for PA lipid signaling in setting thresholds for both mechanosensitivity and pain. However, while the data provided demonstrate convincing changes in behaviour and these changes could be explained by changes in transduction thresholds, these data only provide weak support for this specific conclusion. As the authors note, there is no TREK-1 in D. melanogaster, as such the reported findings could be accounted for by other explanations, not least including potential alterations in the activation threshold of Nav channels required for action potential generation. To conclude that the outcomes were in fact mediated by changes in mechanotransduction, the authors would need to demonstrate changes in receptor potential generation, rather than deriving conclusions from changes in behaviour that could arise from alterations in resting membrane potential, receptor potential generation or the activity of the voltage gated channels required for action potential generation.

      Response: We are willing to restrict the conclusion about the fly behavior as the reviewers see fit. We have shown PLD is mechanosensitivity in a fly cell line, and when we knock out PLD from a fly, the animal exhibits a mechanosensation phenotype. We tried to make it clear in the figure and in the text that we have no evidence of a particular mechanism downstream of PLD mechanosensation.

      This work provides further evidence of the astounding flexibility of mechanical sensing in cells. By outlining how mechanical activation of TREK-1 can be sensitised by mechanical regulation of PLD2 activity, the authors highlight a mechanism by which TREK-1 sensitivity could be regulated under distinct physiological conditions.

      Reviewer #3 (Public Review):

      The manuscript "Mechanical activation of TWIK-related potassium channel by nanoscopic movement and second messenger signaling" presents a new mechanism for the activation of TREK-1 channel. The mechanism suggests that TREK1 is activated by phosphatidic acids that are produced via a mechanosensitive motion of PLD2 to PIP2-enriched domains. Overall, I found the topic interesting, but several typos and unclarities reduced the readability of the manuscript. Additionally, I have several major concerns on the interpretation of the results. Therefore, the proposed mechanism is not fully supported by the presented data. Lastly, the mechanism is based on several previous studies from the Hansen lab, however, the novelty of the current manuscript is not clearly stated. For example, in the 2nd result section, the authors stated, "fluid shear causes PLD2 to move from cholesterol dependent GM1 clusters to PIP2 clusters and this activated the enzyme". However, this is also presented as a new finding in section 3 "Mechanism of PLD2 activation by shear."

      For PLD2 dependent TREK-1 activation. Overall, I found the results compelling. However, two key results are missing.

      1. Does HEK cells have endogenous PLD2? If so, it's hard to claim that the authors can measure PLD2-independent TREK1 activation.

      Response: yes, there is endogenous PLD (enPLD). We calculated the relative expression of xPLD2 vs enPLD. xPLD2 is >10x more abundant (Fig. S3d of Pavel et al PNAS 2020, ref 14 of the current manuscript). Hence, as with anesthetic sensitivity, we expect the xPLD to out compete the endogenous PLD, which is what we see. We added the following sentence and reference : “The xPLD2 expression is >10x the endogenous PLD2 (enPLD2) and out computes the TREK-1 binding site for PLD25.”

      1. Does the plasma membrane trafficking of TREK1 remain the same under different conditions (PLD2 overexpression, truncation)? From Figure S2, the truncated TREK1 seem to have very poor trafficking. The change of trafficking could significantly contribute to the interpretation of the data in Figure 1.

      Response: If the PLD2 binding site is removed (TREK-1trunc), yes, the trafficking to the plasma membrane is unaffected by the expression of xPLD and mPLD (Author response image 2 above). For full length TREK1 (FL-TREK-1), co-expression of mPLD decreases TREK expression (Author response image 2) and coexpression with xPLD increases TREK expression (Figure S2f). This is exactly opposite of what one would expect if surface expression accounted for the change in pressure currents. Hence, we conclude surface expression does not account for loss of TREK-1 mechanosensitivity with xPLD2. A few sentences was added to the discussion. We also performed dSTORM on the TREKtruncated using EGFP. TREK-truncated goes to PIP2 (see figure 2 of 6)

      Author response image 3.

      To better compare the levels of TREK-1 before and after shear, we added a supplemental figure S2f where the protein was compared simultaneously in all conditions. 15 min of shear significantly decreased TREK-1 except with mPLD2 where the levels before shear were already lowest of all the expression levels tested.

      For shear-induced movement of TREK1 between nanodomains. The section is convincing, however I'm not an expert on super-resolution imaging. Also, it would be helpful to clarify whether the shear stress was maintained during fixation. If not, what is the >me gap between reduced shear and the fixed state. lastly, it's unclear why shear flow changes the level of TREK1 and PIP2.

      Response: Shear was maintained during the fixing. xPLD2 blocks endocytosis, presumably endocytosis and or release of other lipid modifying enzymes affect the system. The change in TREK-1 levels appears to be directly through an interaction with PLD as TREK trunc is not affected by over expression of xPLD or mPLD.

      For the mechanism of PLD2 activation by shear. I found this section not convincing. Therefore, the question of how does PLD2 sense mechanical force on the membrane is not fully addressed. Par>cularly, it's hard to imagine an acute 25% decrease cholesterol level by shear - where did the cholesterol go? Details on the measurements of free cholesterol level is unclear and additional/alternative experiments are needed to prove the reduction in cholesterol by shear.

      Response: The question “how does PLD2 sense mechanical force on the membrane” we addressed and published in Nature Comm. In 2016. The title of that paper is “Kinetic disruption of lipid rafts is a mechanosensor for phospholipase D” see ref 13 Petersen et. al. PLD is a soluble protein associated to the membrane through palmitoylation. There is no transmembrane domain, which narrows the possible mechanism of its mechanosensation to disruption.

      The Nature Comm. reviewer identified as “an expert in PLD signaling” wrote the following of our data and the proposed mechanism:

      “This is a provocative report that identi0ies several unique properties of phospholipase D2 (PLD2). It explains in a novel way some long established observations including that the enzyme is largely regulated by substrate presentation which 0its nicely with the authors model of segregation of the two lipid raft domains (cholesterol ordered vs PIP2 containing). Although PLD has previously been reported to be involved in mechanosensory transduction processes (as cited by the authors) this is the 0irst such report associating the enzyme with this type of signaling... It presents a novel model that is internally consistent with previous literature as well as the data shown in this manuscript. It suggests a new role for PLD2 as a force transduction tied to the physical structure of lipid rafts and uses parallel methods of disrup0on to test the predic0ons of their model.”

      Regarding cholesterol. We use a fluorescent cholesterol oxidase assay which we described in the methods. This is an appropriate assay for determining cholesterol levels in a cell which we use routinely. We have published in multiple journals using this method, see references 28, 30, 31. Working out the metabolic fate of cholesterol after sheer is indeed interesting but well beyond the scope of this paper. Furthermore, we indirectly confirmed our finding using dSTORM cluster analysis (Figure 3d-e). The cluster analysis shows a decrease in GM1 cluster size consistent with our previous experiments where we chemically depleted cholesterol and saw a similar decrease in cluster size (see ref 13). All the data are internally consistent, and the cholesterol assay is properly done. We see no reason to reject the data.

      Importantly, there is no direct evidence for "shear thinning" of the membrane and the authors should avoid claiming shear thinning in the abstract and summary of the manuscript.

      Response: We previously established a kinetic model for PLD2 activation see ref 13 (Petersen et al Nature Comm 2016). In that publication we discussed both entropy and heat as mechanisms of disruption. Here we controlled for heat which narrowed that model to entropy (i.e., shear thinning) (see Figure 3c). We provide an overall justification below. But this is a small refinement of our previous paper, and we prefer not to complicate the current paper. We believe the proper rheological term is shear thinning. The following justification, which is largely adapted from ref 13, could be added to the supplement if the reviewer wishes.

      Justification: To establish shear thinning in a biological membrane, we initially used a soluble enzyme that has no transmembrane domain, phospholipase D2 (PLD2). PLD2 is a soluble enzyme and associated with the membrane by palmitate, a saturated 16 carbon lipid attached to the enzyme. In the absence of a transmembrane domain, mechanisms of mechanosensation involving hydrophobic mismatch, tension, midplane bending, and curvature can largely be excluded. Rather the mechanism appears to be a change in fluidity (i.e., kinetic in nature). GM1 domains are ordered, and the palmate forms van der Waals bonds with the GM1 lipids. The bonds must be broken for PLD to no longer associate with GM1 lipids. We established this in our 2016 paper, ref 13. In that paper we called it a kinetic effect, however we did not experimentally distinguish enthalpy (heat) vs. entropy (order). Heat is Newtonian and entropy (i.e., shear thinning) is non-Newtonian. In the current study we paid closer attention to the heat and ruled it out (see Figure 3c and methods). We could propose a mechanism based on kinetic disruption, but we know the disruption is not due to melting of the lipids (enthalpy), which leaves shear thinning (entropy) as the plausible mechanism.

      The authors should also be aware that hypotonic shock is a very dirty assay for stretching the cell membrane. Ouen, there is only a transient increase in membrane tension, accompanied by many biochemical changes in the cells (including acidification, changes of concentration etc). Therefore, I would not consider this as definitive proof that PLD2 can be activated by stretching membrane.

      Response: Comment noted. We trust the reviewer is correct. In 1998 osmotic shock was used to activate the channel. We only intended to show that the system is consistent with previous electrophysiologic experiments.

      References cited:

      1 Du G, Huang P, Liang BT, Frohman MA. Phospholipase D2 localizes to the plasma membrane and regulates angiotensin II receptor endocytosis. Mol Biol Cell 2004;15:1024–30. htps://doi.org/10.1091/mbc.E03-09-0673.

      2 Koch T, Wu DF, Yang LQ, Brandenburg LO, Höllt V. Role of phospholipase D2 in the agonist-induced and constistutive endocytosis of G-protein coupled receptors. J Neurochem 2006;97:365–72. htps://doi.org/10.1111/j.1471-4159.2006.03736.x.

      3 Wheeler DS, Underhill SM, Stolz DB, Murdoch GH, Thiels E, Romero G, et al. Amphetamine activates Rho GTPase signaling to mediate dopamine transporter internalization and acute behavioral effects of amphetamine. Proc Natl Acad Sci U S A 2015;112:E7138–47. htps://doi.org/10.1073/pnas.1511670112.

      4 Rankovic M, Jacob L, Rankovic V, Brandenburg L-OO, Schröder H, Höllt V, et al. ADP-ribosylation factor 6 regulates mu-opioid receptor trafficking and signaling via activation of phospholipase D2. Cell Signal 2009;21:1784–93. htps://doi.org/10.1016/j.cellsig.2009.07.014.

      5 Pavel MA, Petersen EN, Wang H, Lerner RA, Hansen SB. Studies on the mechanism of general anesthesia. Proc Natl Acad Sci U S A 2020;117:13757–66. htps://doi.org/10.1073/pnas.2004259117.

      6 Call IM, Bois JL, Hansen SB. Super-resolution imaging of potassium channels with genetically encoded EGFP. BioRxiv 2023. htps://doi.org/10.1101/2023.10.13.561998.

    2. Author Response:

      Reviewer #1 (Public Review):

      Force sensing and gating mechanisms of the mechanically activated ion channels is an area of broad interest in the field of mechanotransduction. These channels perform important biological functions by converting mechanical force into electrical signals. To understand their underlying physiological processes, it is important to determine gating mechanisms, especially those mediated by lipids. The authors in this manuscript describe a mechanism for mechanically induced activation of TREK-1 (TWIK-related K+ channel. They propose that force induced disruption of ganglioside (GM1) and cholesterol causes relocation of TREK-1 associated with phospholipase D2 (PLD2) to 4,5-bisphosphate (PIP2) clusters, where PLD2 catalytic activity produces phosphatidic acid that can activate the channel. To test their hypothesis, they use dSTORM to measure TREK-1 and PLD2 colocalization with either GM1 or PIP2. They find that shear stress decreases TREK-1/PLD2 colocalization with GM1 and relocates to cluster with PIP2. These movements are affected by TREK-1 C-terminal or PLD2 mutations suggesting that the interaction is important for channel re-location. The authors then draw a correlation to cholesterol suggesting that TREK-1 movement is cholesterol dependent. It is important to note that this is not the only method of channel activation and that one not involving PLD2 also exists. Overall, the authors conclude that force is sensed by ordered lipids and PLD2 associates with TREK-1 to selectively gate the channel. Although the proposed mechanism is solid, some concerns remain.

      1) Most conclusions in the paper heavily depend on the dSTORM data. But the images provided lack resolution. This makes it difficult for the readers to assess the representative images.

      The images were provided are at 300 dpi. Perhaps the reviewer is referring to contrast in Figure 2? We are happy to increase the contrast or resolution.

      As a side note, we feel the main conclusion of the paper, mechanical activation of TREK-1 through PLD2, depended primarily on the electrophysiology in Figure 1b-c, not the dSTORM. But both complement each other.

      2) The experiments in Figure 6 are a bit puzzling. The entire premise of the paper is to establish gating mechanism of TREK-1 mediated by PLD2; however, the motivation behind using flies, which do not express TREK-1 is puzzling.

      The fly experiment shows that PLD mechanosensitivity is more evolutionarily conserved than TREK-1 mechanosensitivity. We should have made this clearer.

      -Figure 6B, the image is too blown out and looks over saturated. Unclear whether the resolution in subcellular localization is obvious or not.

      Figure 6B is a confocal image, it is not dSTORM. There is no dSTORM in Figure 6. This should have been made clear in the figure legend. For reference, only a few cells would fit in the field of view with dSTORM.

      -Figure 6C-D, the differences in activity threshold is 1 or less than 1g. Is this physiologically relevant? How does this compare to other conditions in flies that can affect mechanosensitivity, for example?

      Yes, 1g is physiologically relevant. It is almost the force needed to wake a fly from sleep (1.2-3.2g). See ref 33. Murphy Nature Pro. 2017.

      3) 70mOsm is a high degree of osmotic stress. How confident are the authors that a. cell health is maintained under this condition and b. this does indeed induce membrane stretch? For example, does this stimulation activate TREK-1?

      Yes, osmotic swell activates TREK1. This was shown in ref 19 (Patel et al 1998). We agree the 70 mOsm is a high degree of stress. This needs to be stated better in the paper.

      Reviewer #2 (Public Review):

      This manuscript by Petersen and colleagues investigates the mechanistic underpinnings of activation of the ion channel TREK-1 by mechanical inputs (fluid shear or membrane stretch) applied to cells. Using a combination of super-resolution microscopy, pair correlation analysis and electrophysiology, the authors show that the application of shear to a cell can lead to changes in the distribution of TREK-1 and the enzyme PhospholipaseD2 (PLD2), relative to lipid domains defined by either GM1 or PIP2. The activation of TREK-1 by mechanical stimuli was shown to be sensitized by the presence of PLD2, but not a catalytically dead xPLD2 mutant. In addition, the activity of PLD2 is increased when the molecule is more associated with PIP2, rather than GM1 defined lipid domains. The presented data do not exclude direct mechanical activation of TREK-1, rather suggest a modulation of TREK-1 activity, increasing sensitivity to mechanical inputs, through an inherent mechanosensitivity of PLD2 activity. The authors additionally claim that PLD2 can regulate transduction thresholds in vivo using Drosophila melanogaster behavioural assays. However, this section of the manuscript overstates the experimental findings, given that it is unclear how the disruption of PLD2 is leading to behavioural changes, given the lack of a TREK-1 homologue in this organism and the lack of supporting data on molecular function in the relevant cells.

      We agree, the downstream effectors of PLD2 mechanosensitivity are not known in the fly. Other anionic lipids have been shown to mediate pain see ref 46 and 47. We do not wish to make any claim beyond PLD2 being an in vivo contributor to a fly’s response to mechanical force.

      That said we do believe we have established a molecular function at the cellular level. We showed PLD is robustly mechanically activated in a cultured fly cell line (BG2-c2) Figure 6a of the manuscript. And our previous publication established mechanosensation of PLD (Petersen et. al. Nature Com 2016) through mechanical disruption of the lipids. At a minimum, the experiments show PLDs mechanosensitivity is evolutionarily better conserved across species than TREK1.

      This work will be of interest to the growing community of scientists investigating the myriad mechanisms that can tune mechanical sensitivity of cells, providing valuable insight into the role of functional PLD2 in sensitizing TREK-1 activation in response to mechanical inputs, in some cellular systems.

      The authors convincingly demonstrate that, post application of shear, an alteration in the distribution of TREK-1 and mPLD2 (in HEK293T cells) from being correlated with GM1 defined domains (no shear) to increased correlation with PIP2 defined membrane domains (post shear). These data were generated using super-resolution microscopy to visualise, at sub diffraction resolution, the localisation of labelled protein, compared to labelled lipids. The use of super-resolution imaging enabled the authors to visualise changes in cluster association that would not have been achievable with diffraction limited microscopy. However, the conclusion that this change in association reflects TREK-1 leaving one cluster and moving to another overinterprets these data, as the data were generated from static measurements of fixed cells, rather than dynamic measurements capturing molecular movements.

      When assessing molecular distribution of endogenous TREK-1 and PLD2, these molecules are described as "well correlated: in C2C12 cells" however it is challenging to assess what "well correlated" means, precisely in this context. This limitation is compounded by the conclusion that TREK-1 displayed little pair correlation with GM1 and the authors describe a "small amount of TREK-1 trafficked to PIP2". As such, these data may suggest that the findings outlined for HEK293T cells may be influenced by artefacts arising from overexpression.

      The changes in TREK-1 sensitivity to mechanical activation could also reflect changes in the amount of TREK-1 in the plasma membrane. The authors suggest that the presence of a leak currently accounts for the presence of TREK-1 in the plasma membrane, however they do not account for whether there are significant changes in the membrane localisation of the channel in the presence of mPLD2 versus xPLD2. The supplementary data provide some images of fluorescently labelled TREK-1 in cells, and the authors state that truncating the c-terminus has no effect on expression at the plasma membrane, however these data provide inadequate support for this conclusion. In addition, the data reporting the P50 should be noted with caution, given the lack of saturation of the current in response to the stimulus range.

      We thank the reviewer for his/her concern about expression levels. We did test TREK-1 expression. mPLD decreases TREK-1 expression ~two-fold (see Author response image 1). We did not include the mPLD data since TREK-1 was mechanically activated with mPLD. For expression to account for the loss of TREK-1 stretch current (Figure 1b), xPLD would need to block surface expression of TREK-1. The opposite was true, xPLD2 increased TREK-1 expression increased (see Figure S2c). Furthermore, we tested the leak current of TREK-1 at 0 mV and 0 mmHg of stretch. Basal leak current was no different with xPLD2 compared to endogenous PLD (Figure 1d; red vs grey bars respectively) suggesting TREK-1 is in the membrane and active when xPLD2 is present. If anything, the magnitude of the effect with xPLD would be larger if the expression levels were equal.

      Author response image 1.<br /> TREK expression at the plasma membrane. TREK-1 Fluorescence was measured by GFP at points along the plasma membrane. Over expression of mouse PLD2 (mPLD) decrease the amount of full-length TREK-1 (FL TREK) on the surface more than 2-fold compared to endogenously expressed PLD (enPLD) or truncated TREK (TREKtrunc) which is missing the PLD binding site in the C-terminus. Over expression of mPLD had no effect on TREKtrunc.

      >

      Finally, by manipulating PLD2 in D. melanogaster, the authors show changes in behaviour when larvae are exposed to either mechanical or electrical inputs. The depletion of PLD2 is concluded to lead to a reduction in activation thresholds and to suggest an in vivo role for PA lipid signaling in setting thresholds for both mechanosensitivity and pain. However, while the data provided demonstrate convincing changes in behaviour and these changes could be explained by changes in transduction thresholds, these data only provide weak support for this specific conclusion. As the authors note, there is no TREK-1 in D. melanogaster, as such the reported findings could be accounted for by other explanations, not least including potential alterations in the activation threshold of Nav channels required for action potential generation. To conclude that the outcomes were in fact mediated by changes in mechanotransduction, the authors would need to demonstrate changes in receptor potential generation, rather than deriving conclusions from changes in behaviour that could arise from alterations in resting membrane potential, receptor potential generation or the activity of the voltage gated channels required for action potential generation.

      We are willing to restrict the conclusion about the fly behavior as the reviewers see fit. We have shown PLD is mechanosensitivity in a fly cell line, and when we knock out PLD from a fly, the animal exhibits a mechanosensation phenotype.

      This work provides further evidence of the astounding flexibility of mechanical sensing in cells. By outlining how mechanical activation of TREK-1 can be sensitised by mechanical regulation of PLD2 activity, the authors highlight a mechanism by which TREK-1 sensitivity could be regulated under distinct physiological conditions.

      Reviewer #3 (Public Review):

      The manuscript "Mechanical activation of TWIK-related potassium channel by nanoscopic movement and second messenger signaling" presents a new mechanism for the activation of TREK-1 channel. The mechanism suggests that TREK1 is activated by phosphatidic acids that are produced via a mechanosensitive motion of PLD2 to PIP2-enriched domains. Overall, I found the topic interesting, but several typos and unclarities reduced the readability of the manuscript. Additionally, I have several major concerns on the interpretation of the results. Therefore, the proposed mechanism is not fully supported by the presented data. Lastly, the mechanism is based on several previous studies from the Hansen lab, however, the novelty of the current manuscript is not clearly stated. For example, in the 2nd result section, the authors stated, "fluid shear causes PLD2 to move from cholesterol dependent GM1 clusters to PIP2 clusters and this activated the enzyme". However, this is also presented as a new finding in section 3 "Mechanism of PLD2 activation by shear."

      For PLD2 dependent TREK-1 activation. Overall, I found the results compelling. However, two key results are missing. 1. Does HEK cells have endogenous PLD2? If so, it's hard to claim that the authors can measure PLD2-independent TREK1 activation.

      Yes, there is endogenous PLD (enPLD). We calculated the relative expression of xPLD2 vs enPLD. xPLD2 is >10x more abundant (Fig. S3d of Pavel et al PNAS 2020, ref 14 of the current manuscript). Hence, as with anesthetic sensitivity, we expect the xPLD to out compete the endogenous PLD, which is what we see. This should have been described more carefully in this paper and the studies pointed out that establish this conclusion.

      1. Does the plasma membrane trafficking of TREK1 remain the same under different conditions (PLD2 overexpression, truncation)? From Figure S2, the truncated TREK1 seem to have very poor trafficking. The change of trafficking could significantly contribute to the interpretation of the data in Figure 1.

      If the PLD2 binding site is removed (TREK-1trunc), yes, the trafficking to the plasma membrane is unaffected by the expression of xPLD and mPLD (Figure R1 above). For full length TREK1 (FL-TREK-1), co-expression of mPLD decreases TREK expression (Figure R1) and co-expression with xPLD increases TREK expression (Figure S2). This is exactly opposite of what one would expect if surface expression accounted for the change in pressure currents. Hence, we conclude surface expression does not account for loss of TREK-1 mechanosensitivity with xPLD2.

      For shear-induced movement of TREK1 between nanodomains. The section is convincing, however I'm not an expert on super-resolution imaging. Also, it would be helpful to clarify whether the shear stress was maintained during fixation. If not, what is the time gap between reduced shear and the fixed state. lastly, it's unclear why shear flow changes the level of TREK1 and PIP2.

      Shear was maintained during the fixing. We do not know why shear changes PIP2 and TREK-1 levels. Presumably endocytosis and or release of other lipid modifying enzymes affect the system. The change in TREK-1 levels appears to be directly through an interaction with PLD as TREKtrunc is not affected by over expression of xPLD or mPLD.

      For the mechanism of PLD2 activation by shear. I found this section not convincing. Therefore, the question of how does PLD2 sense mechanical force on the membrane is not fully addressed. Particularly, it's hard to imagine an acute 25% decrease cholesterol level by shear - where did the cholesterol go? Details on the measurements of free cholesterol level is unclear and additional/alternative experiments are needed to prove the reduction in cholesterol by shear.

      The question “how does PLD2 sense mechanical force on the membrane” we addressed and published in Nature Comm. In 2016. The title of that paper is “Kinetic disruption of lipid rafts is a mechanosensor for phospholipase D” see ref 13 Petersen et. al. PLD is a soluble protein associated to the membrane through palmitoylation. There is no transmembrane domain, which narrows the possible mechanism of its mechanosensation to disruption.

      The Nature Comm. reviewer identified as “an expert in PLD signaling” wrote the following of our data and the proposed mechanism:

      "This is a provocative report that identifies several unique properties of phospholipase D2 (PLD2). It explains in a novel way some long established observations including that the enzyme is largely regulated by substrate presentation which fits nicely with the authors model of segregation of the two lipid raft domains (cholesterol ordered vs PIP2 containing). Although PLD has previously been reported to be involved in mechanosensory transduction processes (as cited by the authors) this is the first such report associating the enzyme with this type of signaling... It presents a novel model that is internally consistent with previous literature as well as the data shown in this manuscript. It suggests a new role for PLD2 as a force transduction tied to the physical structure of lipid rafts and uses parallel methods of disruption to test the predictions of their model."

      Regarding cholesterol. We use a fluorescent cholesterol oxidase assay which we described in the methods. This is an appropriate assay for determining cholesterol levels in a cell which we use routinely. We have published in multiple journals using this method, see references 28, 30, 31. Working out the metabolic fate of cholesterol after sheer is indeed interesting but well beyond the scope of this paper. Furthermore, we indirectly confirmed our finding using dSTORM cluster analysis (Figure 3d-e). The cluster analysis shows a decrease in GM1 cluster size consistent with our previous experiments where we chemically depleted cholesterol and saw a similar decrease in cluster size (see ref 13). All the data are internally consistent, and the cholesterol assay is properly done. We see no reason to reject the data.

      Importantly, there is no direct evidence for "shear thinning" of the membrane and the authors should avoid claiming shear thinning in the abstract and summary of the manuscript.

      We previously established a kinetic model for PLD2 activation see ref 13 (Petersen et al Nature Comm 2016). In that publication we discussed both entropy and heat as mechanisms of disruption. Here we controlled for heat which narrowed that model to entropy (i.e., shear thinning) (see Figure 3c). We provide an overall justification below. But this is a small refinement of our previous paper, and we prefer not to complicate the current paper. We believe the proper rheological term is shear thinning. The following justification, which is largely adapted from ref 13, could be added to the supplement if the reviewer wishes.

      Justification: To establish shear thinning in a biological membrane, we initially used a soluble enzyme that has no transmembrane domain, phospholipase D2 (PLD2). PLD2 is a soluble enzyme and associated with the membrane by palmitate, a saturated 16 carbon lipid attached to the enzyme. In the absence of a transmembrane domain, mechanisms of mechanosensation involving hydrophobic mismatch, tension, midplane bending, and curvature can largely be excluded. Rather the mechanism appears to be a change in fluidity (i.e., kinetic in nature). GM1 domains are ordered, and the palmate forms van der Waals bonds with the GM1 lipids. The bonds must be broken for PLD to no longer associate with GM1 lipids. We established this in our 2016 paper, ref 13. In that paper we called it a kinetic effect, however we did not experimentally distinguish enthalpy (heat) vs. entropy (order). Heat is Newtonian and entropy (i.e., shear thinning) is non-Newtonian. In the current study we paid closer attention to the heat and ruled it out (see Figure 3c and methods). We could propose a mechanism based on kinetic disruption, but we know the disruption is not due to melting of the lipids (enthalpy), which leaves shear thinning (entropy) as the plausible mechanism.

      The authors should also be aware that hypotonic shock is a very dirty assay for stretching the cell membrane. Often, there is only a transient increase in membrane tension, accompanied by many biochemical changes in the cells (including acidification, changes of concentration etc). Therefore, I would not consider this as definitive proof that PLD2 can be activated by stretching membrane.

      Comment noted. We trust the reviewer is correct. In 1998 osmotic shock was used to activate the channel. We only intended to show that the system is consistent with previous electrophysiologic experiments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The present work establishes 14-3-3 proteins as binding partners of spastin and suggests that this binding is positively regulated by phosphorylation of spastin. The authors show evidence that 14-3-3 >- spastin binding prevents spastin ubiquitination and final proteasomal degradation, thus increasing the availability of spastin. The authors measured microtubule severing activity in cell lines and axon regeneration and outgrowth as a prompt to spastin activity. By using drugs and peptides that separately inhibit 14-3-3 binding or spastin activity, they show that both proteins are necessary for axon regeneration in cell culture and in vivo models in rats.

      The following is an account of the major strengths and weaknesses of the methods and results.

      Major strengths

      -The authors performed pulldown assays on spinal cord lysates using GST-spastin, then analyzed pulldowns via mass spectrometry and found 3 peptides common to various forms of 14-3-3 proteins. In co-expression experiments in cell lines, recombinant spastin co-precipitated with all 6 forms of 14-3-3 tested.

      -By protein truncation experiments they found that the Microtubule Binding Domain of spastin contained the binding capability to 14-3-3. This domain contained a putative phosphorylation site, and substitutions that cannot be phosphorylated cannot bind to spastin.

      -spastin overexpression increased neurite growth and branching, and so did the phospho null spastin. On the other hand, the phospho mimetic prevents all kinds of neurite development.

      -Overexpression of GFP-spastin shows a turn-over of about 12 hours when protein synthesis is inhibited by cycloheximide. When 14-3-3 is co-overexpressed, GFP-spastin does not show a decrease by 12 hours. When S233A is expressed, a turn-over of 9 hours is observed, indicating that the ability to be phosphorylated increases the stability of the protein.

      -In support of that notion, the phospho-mimetic S233D makes it more stable, lasting as much as the over-expression of 14-3-3.

      -Authors show that spastin can be ubiquitinated, and that in the presence of ubiquitin, spastin-MT severing activity is inhibited.

      -By combining FCA with Spastazoline, the authors claim that FCA increased regeneration is due to increased spastin Activity in various models of neurite outgrowth and regeneration in cell culture and in vivo, the authors show impressive results on the positive effect of FCA in regeneration, and that this is abolished when spastin is inhibited.

      Major weaknesses

      -However convincing the pull-downs of the expressed proteins, the evidence would be stronger if a co-immunoprecipitation of the endogenous proteins were included.

      We thank the reviewer for their succinct summary of the main results and strengths of our study. We acknowledge the reviewers' valuable suggestions and agree that performing endogenous co-immunoprecipitation (co-IP) experiments in neurons is crucial for supporting our conclusions. To address this question, cortical neurons were cultured in vitro for endogenous IP experiment. The cortical neurons were cultured using a neurobasal medium supplemented with 2% B27, and using cytarabine to inhibit the proliferation of glial cells. The proteins were then extracted and subjected to the immunoprecipitation experiments using antibodies against spastin. The results, as shown in Fig.1C in the revised manuscript, clearly demonstrate that 14-3-3 protein indeed interacts with spastin within neurons.

      -To better establish the impact of spastin phosphorylation in the interaction, there is no indication that the phosphomimetic (S233D) can better bind spastin, and this result is contradicting to the conclusion of the authors that spastin-14-3-3 interaction is necessary for (or increases) spastin function.

      Thank you for your valuable and constructive comments. We agree with your consideration. To reinforce the importance of phosphorylated spastin in this binding model, we conducted additional experiments by transfecting S233D into 293T cells and performed immunoprecipitation experiments (Fig.2H). The results clearly demonstrate that spastin (S233D) exhibits enhanced binding to spastin, indicating that phosphorylation at the S233 site is critical for this interaction. Additionally, we observed that spastin (S233D) maintains its binding to 14-3-3 even in the presence of staurosporine. This data further supports and strengthens our conclusions.

      -To fully support the authors' suggestion that 14-3-3 and spastin work in the same pathway to promote regeneration, I believe that some key observations are missing.

      1-There is no evidence showing that 14-3-3 overexpression increases the total levels of spastin, not only its turnover.

      Thank you for your consideration and valuable input. We have previously demonstrated that overexpression of 14-3-3 leads to an increase in the protein levels of spastin in the absence of CHX (Fig.3E&F). Furthermore, we also observed an upregulated protein levels of spastin S233D compared to the wild-type (Fig.3G). We have now included these results in the revised manuscript.

      2- There is no indication that increasing the ubiquitination of spastin decreases its levels. To suggest that proteasomal activity is affecting the levels of a protein, one would expect that proteasomal inhibition (with bortezomib or epoxomycin), would increase its levels.

      Thanks for your concern. We believe that this evidence is critical. Indeed, another study by our team is working to elucidate the ubiquitination degradation pathway of spastin. In addition, a previous study has shown that phosphorylation of the S233 site of spastin can affect its protein stability (Spastin recovery in hereditary spastic paraplegia by preventing neddylation-dependent degradation, doi:10.26508/lsa.202000799.). To better support our conclusions, we have supplemented the results in Fig.3L&M. The results showed that the proteasome inhibitor MG132 could significantly increase the protein level of spastin, whereas CHX could significantly decrease the protein level of spastin, and the degradation of spastin is significantly hindered in the presence of both CHX and MG132. This experiment also further showed that ubiquitination of spastin reduced its protein level.

      3- Authors show that S233D increases MT severing activity, and explain that it is related to increased binding to 14-3-3. An alternative explanation is that phosphorylation at S233 by itself could increase MT severing activity. The authors could test if purified spastin S233D alone could have more potent enzymatic activity.)

      We appreciate the reviewer’s consideration. After investigating the interaction between 14-3-3 and spastin, we first aimed to determine whether the S233 phosphorylation mutation of spastin influenced its microtubule-severing activity. We found that overexpression of both S233A and S233D mutants resulted in significant microtubule severing (as indicated by a significant decrease in microtubule fluorescence intensity) (Fig.S2). Furthermore, it is noteworthy that S233 is located outside the microtubule-binding domain (MTBD, 270-328 amino acids) and the AAA region (microtubule-severing region, 342-599 amino acids) of spastin. Based on our initial observations, we believe that the phosphorylation of the S233 residue in spastin does not impact its microtubule-severing function. Additionally, under the same experimental conditions, we observed that the green fluorescence intensity of GFP-spastin S233D was significantly higher than that of GFP-spastin S233A. Based on these phenomena, we speculated that phosphorylation of the S233 residue of spastin might affect its protein stability, leading us to conduct further experiments. Furthermore, we fully acknowledge the reviewer's concern; however, due to technical limitations, we were unable to perform an in vitro assay to test the microtubule-severing activity of spastin. We have provided an explanation for this consideration in the revised version.

      -Finally, I consider that there are simpler explanations for the combined effect of FC-A and spastazoline. FC-A mechanism of action can be very broad, since it will increase the binding of all 14-3-3 proteins with presumably all their substrates, hence the pathways affected can rise to the hundreds. The fact that spastazoline abolishes FC-A effect, may not be because of their direct interaction, but because spastin is a necessary component of the execution of the regeneration machinery further downstream, in line with the fact that spastizoline alone prevented outgrowth and regeneration, and in agreement with previous work showing that normal spastin activity is necessary for regeneration.

      We appreciate the considerations raised by the reviewer. It is evident that spastin is not the exclusive substrate protein for 14-3-3, and it is challenging to demonstrate that 14-3-3 promotes nerve regeneration and recovery of spinal cord injury directly through spastin in vivo. However, we have identified the importance of 14-3-3 and spastin in the process of nerve regeneration. Importantly, we have conducted supplementary experiments to support the stabalization of spastin by FC-A treatment within neurons (Fig.4M), as well as the repair process of spinal cord injury in vivo (Fig.5D). The results showed that FC-A treatment in cortical neurons could enhance the stability of spastin protein levels, and we also demonstrated a consistent trend of upregulated protein levels of spastin and 14-3-3 following spinal cord injury. Moreover, the protein levels were significantly elevated in the the FC-A group of mice. These results also support that 14-3-3 enhances spastin protein stability to promote spinal cord injury repair. The manuscript was revised accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The idea of harnessing small molecules that may affect protein-protein interactions to promote axon regeneration is interesting and worthy of study. In this manuscript, Liu et al. explore a 14-3-3-spastin complex and its role in axon regeneration.

      Strengths:

      Some of the effects of FC-A on locomotor recovery after spinal cord contusion look interesting.

      Weaknesses:

      The manuscript falls short of establishing that a 14-3-3-spastin complex is important for any FC-A-dependent effects and there are several issues with data quality that make it difficult to interpret the results. Importantly, the effects of the spastin inhibitor have a major impact on neurite outgrowth suggesting that cells simply cannot grow in the presence of the inhibitor and raising serious questions about any selectivity for FC-A - dependent growth. Aspects of the histology following spinal cord injury were not convincing.

      We sincerely appreciate the reviewer for evaluating our manuscript. Given the multitude of substrates that interact with 14-3-3, and considering spastin's indispensable role in neuroregeneration, it is indeed challenging to experimentally establish that FC-A's neuroregenerative effect is directly mediated through spastin in vivo. Therefore, we have provided additional crucial evidence regarding the changes in spastin protein levels following spinal cord injury, as well as the application of FC-A after spinal cord injury. Furthermore, we have made relevant adjustments to the uploaded images to enhance the resolution of the presented figures, as detailed in the subsequent response.

      Reviewer #3 (Public Review):

      Summary: The current manuscript c laims that 14-3-3 interacts with spastin and that the 14-3-3/spastin interaction is important to regulate axon regeneration after spinal cord injury.

      Strengths:

      In its present form, this reviewer identified no clear strengths for this manuscript.

      Weaknesses:

      In general, most of the figures lack sufficient quality to allow analyses and support the author's claims (detailed below). The legends also fail to provide enough information on the figures which makes it hard to interpret some of them. Most of the quantifications were done based on pseudo-replication. The number of independent experiments (that should be defined as n) is not shown. The overall quality of the written text is also low and typos are too many to list. The original nature of the spinal cord injury-related experiments is unclear as the role of 14-3-3 (and spastin) in axon regeneration has been extensively explored in the past.

      We sincerely appreciate the careful consideration and rigorous evaluation provided by the reviewer. In the revised version, we have made effort to present high-resolution figures and provide more detailed figure legends. Furthermore, we have made relevant adjustments to the statistical methods in accordance with the reviewer's suggestions. The manuscript has also undergone a thorough review and correction process to eliminate any writing-related errors. Please refer to the following response.

      To the best of our knowledge, there has been no clear reports on the efficacy of 14-3-3 in the repair of spinal cord injury. Kaplan A et al. (doi: 10.1016/j.neuron.2017.02.018) reported a reduction in die-back of the corticospinal tract following spinal cord injury using FC-A as a filler in situ in the lesion site. However, the specific effects of FC-A on spinal cord injury, such as motor function and neural reactivity, as well as the expression characteristic of 14-3-3 after spinal cord injury, have not been extensively elucidated. Additionally, prior research on spastin's role in axon regeneration primarily focused on the effects in Drosophila, and its regenerative effects in the central nervous system of adult mammals after injury have not been reported. Therefore, our study provides crucial insights into the importance of 14-3-3 and spastin in the process of spinal cord injury repair in mammals.

      Reviewer #1 (Recommendations For The Authors):

      There are many spelling and grammar errors, please revise. Examples:

      -approach revealed14-3-3

      -We have detected different many 14-3-3 peptides

      -Line 1057 (D) 14-3-3 agnoist FC-A

      -There is a discrepancy between panel names and figure legend in Figure 4.

      -There is another discrepancy between the color coding of treatments in Figure 7. All panels show "injury" in red and FC-A in orange, but in panel E, these are swapped. This is confusing to readers.

      Thank you for the thorough and rigorous review. We have re-colored the relevant chart. The manuscript has also undergone a thorough review to eliminate any writing-related errors.

      Most images from confocal microscopy are blurred or low resolution. They should be sharper for the type of microscopy used.

      We have adjusted and re-uploaded the images with higher resolution. Additionally, we have enlarged the relevant images.

      The list of all peptides retrieved in the Mass-Spec analyses of the GST-spastin pulldown must be publicly available, according to eLife rules.

      Thank you for your suggestion. We have now uploaded the mass spectrometry data.

      To determine where the 14-3-3/spastin protein142 complex functions in neurons, we double stained hippocampal neurons with spastin143 and 14-3-3 antibody, and found that 14-3-3 was colocalized with spastin in the entire144 cell compartment (Figure 1C).

      Colocalization by confocal fluorescence microscopy is not evidence for protein complexes.

      While co-localization experiments may not directly demonstrate protein-protein interactions, they can still provide valuable insights into the cellular localization of the proteins and suggest potential interactions between them. Therefore, we adjusted the statement.

      Fig1F- Co-immunoprecipitation assay results confirmed that all 14-3-3 isoforms could form direct complexes with spastin.

      CoIP in cells overexpressing the proteins is not evidence that it is direct. That they can interact directly with each other can be extracted from the evidence in vitro with purified proteins.

      We agree with this and we have changed our statement accordingly.

      For a broad audience to have a better understanding, the authors have to explain their a.a. subtitucions of Serine233, one being mimicking phosphorylation (S233D) and the other rendering the protein not being able to be phosphorylated in that position (S233A).

      We appreciate the suggestion. We have provided a more detailed explanation in revised manuscript.

      The panel of neuronas in Fig2G is mislabeled, because it is twice spastin S233A, instead of S233D.

      We apologize for this mistake and we have corrected it in the panel.

      FCA may increase the interaction of 14-3-3 with any of its substrates, including spastin. One would appreciate evidence that FCA increases the MT-severing activity of spastin, as assumed by authors

      We appreciate the reviewer’s suggestion. In this study, we overexpressed spastin to investigate its microtubule severing activity. It is important to note that overexpressing spastin significantly exceeds the normal physiological concentration of the protein. Using excessive amounts of FC-A to enhance the interaction between 14-3-3 and spastin in cells can lead to cell toxicity. Therefore, we chose to overexpress 14-3-3 instead of employing excessive FC-A.

      In Fig2F, the interaction of 14-3-3 with Spas-S233D would have been very informative.

      Thank you for the constructive suggestions from the reviewer. We have supplemented the corresponding co-immunoprecipitation experiments (Fig.).

      The functional effect of S233A and S233D does not correlate with a function of 14-3-3 in neurite outgrowth. This is because S233A does not interact with 14-3-3, however, it is as good as WT spastin... meaning that binding of 14-3-3 with spastin is not necessary...

      We appreciate the reviewer's consideration. The observed phenomenon of spastin WT and S233A promoting axon growth do not align with the physiological state within neurons. This may mask the true effects of S233A or S233D on neuronal axon growth. It is documented that the proper dosage of spastin is essential for neuronal growth and regeneration, as excessive or insufficient amounts can hinder axon growth. Excessive spastin levels can disrupt the overall cellular MTs. Therefore, spastin were moderately expressed by adjusting the transfection dosage and duration. Nevertheless, we were unable to precisely control the expression levels of spastin for both WT and S233A, also resulting in an overexpression state compared to the physiological state. As a result, the crucial role of spastin S233 in neural growth under physiological conditions may be masked. We have addressed this issue in the revised version of our manuscript.

      In panels 3C and D it is not clear if it does contain 14-3-3.... it seems it does not... but clarify.

      We apologize for any confusion. Since there is endogenous 14-3-3 present in the cells, we utilized spastin S233A and S233D to mimic the binding pattern with 14-3-3 according to the established interaction model. This information has been clarified in the original manuscript.

      Line 217 should indicate Figure 3, not Figure 5

      We have made the corresponding corrections.

      In F3G, it is intriguing that the input blot shows a decrease in Ubiquitin proteins when there is expression of flag ubiquitin...

      We apologize for the error in our presentation. In the control group, we actually overexpressed Flag-ubiquitin and GFP instead of Flag and GFP-spastin. Additionally, to further elucidate the impact of different phosphorylation states on spastin ubiquitination and degradation, we have conducted additional ubiquitination experiments (Fig.3N), which are now included in the revised version of our manuscript.

      S233 mutations seem to affect the effective turnover of spastin, but does not seem to change the levels of the spastin protein...hence, the conclusion that 14-3-3 protects from degradation is overstated.

      We thank the reviewers for the careful review and we have revised the statement accordingly.

      The mode of action of R18 FCA should be introduced earlier in the text.

      Thank you for the reviewer's correction. We have provided a corresponding description of the effects of FC-A and R18 on the interaction between 14-3-3 and spastin in the ubiquitination experiments section of the manuscript.

      Line 296 reads: Our results revealed that levels of 14-3-3 protein remained high even at 30 DPI, indicating that 14-3-3 plays an important role in the recovery of spinal cord injury.

      This is overstated since it can well be that an upregulated protein is inhibitory. We thank the reviewers for their consideration and we have made adjustments accordingly.

      It is not clear if 14-3-3 prevents ubiquitination of spastin, then its levels should be higher... it is noteworthy that they did not measure its levels in nerve tissue after injury. For example, in experiments shown in Figure 5A, it would have been very useful the observation of the levels of spastin.

      We appreciate the reviewer's consideration. We have now included the assessment of spastin protein levels following spinal cord injury. Additionally, we have collected the injured spinal cord lysates in mice treated with FC-A for western blot analysis. The results revealed that the expression trend of 14-3-3 protein is largely consistent with spastin after spinal cord injury. Furthermore, the treatment with FC-A was found to enhance the expression of spastin after spinal cord injury (Fig. 5C&D)."

      Panel 5G reads "nerve regeneration across the lesion site", but it actually measured NF levels, according to the legend.

      Thanks to the reviewers for the critical review. We have revised the chart accordingly.

      361 "BMS" should be explained in the results section for a better understanding of the results by non-experts.

      Thank you to the reviewers for their suggestions. We have explained this in the results section accordingly.

      Reviewer #2 (Recommendations For The Authors):

      1. The results of the mass spec and co-IP in Figure 1 are unclear.

      a) Are all of the peptides in Fig. 1A from 14-3-3 and were there only 3 14-3-3 peptides that were identified?

      The mass spectrum results did identify only three 14-3-3 peptides, and these three peptides were highly conserved across all isoforms.

      b) The blot in panel B needs to show the input band for spastin and 14-3-3 from the same gel and not spliced so that the level of enrichment can be evaluated in the co-IP.

      Thanks to the reviewer's comments, we have presented the whole gel (Fig.1B)

      c) Further, does an IP for 14-3-3 co-precipitate spastin?

      Thank you for your concern. We appreciate your feedback. Our 14-3-3 antibody is capable of Western blot experiments and recognizes all subtypes (Pan 14-3-3, Cell Signaling Technology, Cat #8312). Unfortunately, it is not suitable for immunoprecipitation (IP) experiments. Therefore, we have employed additional approaches, namely immunoprecipitation and pull-down assays, to further investigate the interaction between 14-3-3 and spastin.

      1. It is difficult to say anything about 14-3-3 - spastin co-localization in hippocampal neurons (1c) since 14-3-3 labels the entire hippocampal neuron so any protein will co-localize.

      We appreciate the comments. The co-localization experiments have provided evidence of the relative expression of both 14-3-3 and spastin in neurons, suggesting their potential interaction within neuronal cells. We have made the necessary revisions to accurately describe the results of the co-localization experiments in the manuscript.

      To further investigate the interaction between 14-3-3 and spastin within neurons, we have conducted additional co-immunoprecipitation (Co-IP) experiments using cortical neuron lysates (Fig.1C).

      1. The molecular weight of 14-3-3 is 25-28 kDa but the band in panel 1B and in subsequent figures it is below 15 kDa. Fig. 1F - the spastin band also seems to be low compared to predicted molecular weight and other W. Blot reports in the literature so some indication of how the antibody was validated would be important.

      Apologies for the mistakes. We have carefully re-evaluated the western blot images (See Author response image 1). We have confirmed that the molecular weight of the 14-3-3 protein is approximately 33 kDa. In the case of spastin, its molecular weight is around 55-70 kDa. Additionally, the GFP-spastin fusion protein has an estimated molecular weight of approximately 90 kDa. We have conducted a thorough verification and made appropriate adjustments to the molecular weight labels in all western blot images.

      Author response image 1.

      1. Fig 1G is a co-immunoprecipitation and it is not clear what the authors mean by "direct complexes" as claimed in line 150 of the results since this does not show direct binding between 14-3-3 and spastin. None of the assays in Fig. 1 assess "direct" binding between the two proteins and the authors should be clear in their interpretation.

      We agree with the reviewer's comments and have removed the word "direct" from the text.

      1. Fig. 1D - there is no validation that staurosporine (protein kinase inhibitor, not protein kinase as per typo in Line 167) affects the phosphorylation levels of spastin.

      Thank you for your valuable comments. In our group, we have conducted another study that has confirmed the involvement of CAMKII in mediating spastin phosphorylation. Furthermore, we have found that the addition of staurosporine significantly reduces the phosphorylation levels of spastin (unpublished results). In response to the reviewer's comment, we are pleased to provide western blot experiments demonstrating the effect of staurosporine on reducing spastin phosphorylation. The phosphorylation levels of spastin were assessed using a Pan Phospho antibody (Fig.2D).

      1. Fig. 2F - it would be important to test if spastin S233D interacts more robustly with 14-3-3 and if this is insensitive to staurosporine.

      Thank you for your comments. The suggestion provided by the reviewer is highly significant for supporting our conclusion that "phosphorylation of spastin is a prerequisite for its interaction with 14-3-3." Therefore, we have conducted additional immunoprecipitation experiments to further supplement our findings (Fig.2H). The experimental results demonstrate that the binding affinity between spastin S233D and 14-3-3 is stronger compared to spastin WT.

      1. Line 179 "Next, we transfected Ser233 mutation of spastin (spastin S233A or spastin S233D) with flag tagged 14-3-3 and generated Pearson's correlation coefficients. Results revealed that spastin 181 S233D was markedly colocalized with 14-3-3, with minimal colocalization with spastin S233A (Figure 2A-B)." Assuming the authors are referring to supplemental Figure 2, the 14-3-3 covers the entire cell thus I think measures of co-localization are uninterpretable.

      We agree with the reviewer's comment. We realize that 14-3-3θ exhibits a ubiquitous cellular distribution, which renders the measurement of its co-localization coefficients inconclusive. Therefore, we have decided to remove Supplementary Figure 2 from the manuscript.

      1. Line 189 "Consistent with earlier results, spastin promoted neurite outgrowth, as evidenced by both the length and total branches of neurite." - It is unclear what earlier results the authors are referring to. The authors should clarify how they determined the "moderate" expression level.

      We thank the review’s suggestions. The "earlier results" mentioned here refers to previously published articles, we now have added relevant references. Existing literature indicates that an appropriate dosage of spastin is necessary for neuronal growth and regeneration. However, both excessive and insufficient amounts of spastin are detrimental to axonal growth. Excessive spastin disrupts the overall microtubule network within cells. We controlled plasmid transfection dosage and transfection durations to achieve moderate expression. We have provided an explanation of these details in the revised version.

      1. The effects of WT spastin and spastin S233A were similar in spite of the fact that S233A does not bind to 14-3-3, which is inconsistent with the author's model that spastin-14-3-3 binding promotes growth. Line 191 - the authors mention that spastin S233D was toxic but I do not see any cell death measurements. I assume the bottom right panel in Fig. 2G labelled as spastin S233A is mislabeled and should be S233D.

      In response to comment 8, the transfection of both wild-type (WT) spastin and S233A mutant failed to precisely control the expression levels around the physiological concentration. Consequently, we observed an overexpression of spastin in both cases, which obscured the critical role of S233 phosphorylation in neurite outgrowth. We have addressed this issue in the revised version of the manuscript.

      1. Fig. 3. Does spastin(S233D) bind constitutively to 14-3-3? Why is spastin S233A not less stable than WT spastin based on the author's model?

      We propose that 14-3-3 is more likely to interact with spastin S233D in a non-constitutive manner. The instability of the S233A protein is attributed to the disruption of its ubiquitination degradation process due to the absence of 14-3-3 binding.

      1. The ubiquitin blot in Fig. 3G is not convincing and not quantified.

      We acknowledge the mislabeling in our figures. In the control group, Flag-Ubiquitin was also overexpressed, and we transfected GFP as a control instead of GFP-spastin. To further enhance the reliability, we conducted additional ubiquitination experiments (Fig.3N), which revealed a significant increase in spastin (S233A) ubiquitination levels compared to the WT group, consistent with previous research findings (Spastin recovery in hereditary spastic paraplegia by preventing neddylation-dependent degradation, doi:10.26508/lsa.202000799). Additionally, we observed that the addition of R18 could partially enhance spastin ubiquitination levels, as quantitatively illustrated in the figure (Fig.3O). This result further underscores the inhibitory role of 14-3-3 in the ubiquitination degradation pathway of spastin.

      1. I do not understand how the glutamate injury fits with the narrative (Fig. 4C).

      Excessive glutamate exposure can induce severe intracellular oxidative stress reactions, leading to the disruption of physiological processes such as mitochondrial energy production. This, in turn, results in the swelling and lysis of neuronal processes, a phenomenon known as neuronal necrosis. During this state, neurite maintenance is obstructed, and neurites exhibit swelling and breakage (Glutamate-induced neuronal death: a succession of necrosis or apoptosis depending on mitochondrial function. Neuron. 1995 Oct;15(4):961-73). We have provided a more comprehensive explanation of this phenomenon in the revised version of our manuscript.

      1. Some commentary about the selectivity of spastazoline to inhibit spastin should be included - it would be helpful if the authors could explain that this is a spastin inhibitor in the manuscript. FC-A still seems to promote growth in the presence of spastazoline suggesting that the FC-A effects are not dependent on spastin (Fig. 4E). The statistical analysis section of the materials and methods indicates that multiple groups were analyzed by one-way ANOVA. This seems unusual since the controls for cellular transfection are different than for small molecules (FC-A) and for peptides such as R18. As such, there is no vehicle control for the FC-A condition and it is difficult to assess the FC-A vs Spastazoline vs FA-A + Spastoazoline. The authors should clarify (Fig. 4E-J)

      Thank you for the reviewer’s suggestions. In the revised version, we have provided a more detailed explanation of the specific inhibition of spastin's severing function by spastazoline.

      We observed that FC-A, in combination with spastazoline, still exhibited a certain degree of promotion in neurite growth compared to the injury group under the glutamate circumstances. Evidently, spastin is not the exclusive substrate for 14-3-3, and FC-A might delay cellular oxidative stress reactions by facilitating the interaction of 14-3-3 with other substrates, such as the FOXO transcription factors as mentioned in the introduction. Nevertheless, our results still demonstrate that the addition of spastazoline significantly diminishes the promoting effect of FC-A on neurite growth, indicating that FC-A affects neuronal growth by impacting spastin.

      Furthermore, in the drug-treated groups, we overexpressed GFP to trace the morphology of neurons. Culture media were exchanged following transfection, and during media exchange, drugs were added. And an equivalent amount of DMSO or ethanol were added as controls to rule out the influence of solvents on neurons.

      1. There is a good possibility that spastin is required for all axon regeneration and that there is no selectivity for the FC-A pathway and this is a major issue with the interpretation of the manuscript (Fig 4K-L).

      We acknowledge this point. Clearly, spastin is not the exclusive substrate for 14-3-3, and our experimental evidence does not establish that 14-3-3 solely promotes neuronal regeneration through spastin. Nevertheless, we have identified the significance of 14-3-3 and spastin in the process of neural regeneration. Furthermore, we conducted complementary experiments to support the stability of spastin by FC-A treatment both in vitro and in vivo. We found an enhanced protein expression in cortical neurons after FC-A treatment (Fig.4M). Also, the results indicate a consistent elevation trend in the protein levels of spastin and 14-3-3 following spinal cord injury (Fig.5C&H). Moreover, in the FC-A group of mice, there was a significant increase in spastin protein levels (Fig.5D&I). These results also support that 14-3-3 promotes spinal cord injury repair by enhancing spastin protein stability.

      1. Fig. 5C- it is unclear where the photomicrographs were taken relative to the lesion.

      We obtained tissue sections from the lesion core and the above segments for histological analysis. Given the scarcity of neural compartment at the injury center, we select tissue slices as close as possible to lesion core to illustrate the relationship between 14-3-3 and the injured neurons. We have provided an explanation of this in the revised version of the manuscript.

      1. The authors need to provide some evidence that the FC-A and spastazoline compounds are accessing the CNS following IP injection.

      We thank the review’s suggestion. Although direct visualization evidence of FC-A and spastazoline entering the CNS is challenging to obtain, several indicators suggest drug penetration into spinal cord tissue. Firstly, behavioral and electrophysiological experiments in vivo demonstrate that drug injections indeed affect the neural activity of mice. Secondly, following spinal cord injury, the blood-spinal cord barrier was disrupted at the injury site, combined with the fact that both FC-A (molecular weight: 680.82 Da) and spastazoline (molecular weight: 382.51 Da) are small molecule drugs, these increases the likelihood of these small molecules entering the injured spinal cord tissue. Furthermore, our microtubule staining results indicated that FC-A and spastazoline did influence the acetylation ratio of microtubules. These findings support the drug penetration into spinal cord tissue.

      1. Some quantification of Fig. 5D would be important to support the contention that the lesion site is impacted by FC-A treatment.

      Thank you for the suggestion. We have included quantitative analysis for Figure 5D (Figure) as recommended.

      1. The NF and 5-HT staining in Fig. 5D and in Fig. 7A and B does not clearly define fibers and is not convincing.

      We appreciate the concerns. While we did not present whole nerve fibers, we therefore employed NF and 5-HT immunoreactive fluorescence intensity as an indicator to assess the regeneration of nerve fibers as previously described, but not axons per square millimeter (Baltan S, et, al. J Neurosci. 2011 Mar 16;31(11):3990-9; Iwai M, et, al. Stroke. 2010 May;41(5):1032-7; Wang Y, et, al. Elife. 2018 Sep 12;7:e39016; Altmann C, et, al. Mol Neurodegeneration. 2016 Oct 22;11(1):69).

      Our results showed that in the spinal cord injury group, there was strongly decreased NF-positive stainning (with a slight increase in 5-HT). In contrast, the FC-A treatment group exhibited a significant higher abundance of NF-positive signals (or an increased 5-HT signal) in the lesion site, which also suggests the reparative effect of FC-A on nerves. We also intend to refine our immunohistochemical methods in future experiments.

      Minor Comments: 1. Line 80 -84. To my knowledge the only manuscripts examining the effects of spastin in axon regeneration models includes the analysis in drosophila (i.e. ref 15 and 16) and a study in sciatic nerve that reported an index of functional recovery but did not perform any histology to assess axon regeneration phenotypes. The literature should be more accurately reflected in the introduction.

      We appreciate the suggestions from the reviewer. In the revised version, we have provided further clarification on the novelty of spastin in the spinal cord injury repair process.

      1. Line 73: The meaning of the following statement needs to be clarified: "spastin has two major isoforms, namely M1 and M87, coded form different initial sites."

      We have provided additional elaboration for this statement in the revised version.

      1. Line 216: Results indicated that GFP-spastin could be ubiquitinated, while inhibiting the 217 binding of 14-3-3/spastin promoted spastin ubiquitination (Figure 5G)." - Should be Fig 3G

      Sorry about the mistake. We have made the corresponding changes in the revised version.

      1. Line 255: "Briefly, we established a neural injury model as previously described(31)" - the basics of the injury model need to be described in this manuscript.

      In the revised version, we have provided further elaboration on the glutamate-induced neuronal injury model.

      Reviewer #3 (Recommendations For The Authors):

      Figure 1: A- Both legend and text fail to provide detail on this specific panel.

      We have provided a more detailed and comprehensive description of the legend and results in this section.

      B- Is the contribution of non-neuronal cells for co-IPs relevant? Co-IP with isolated neuronal extracts (instead of spinal cord tissue) should be performed.

      We thank the review’s suggestion. To further elucidate their interaction within neurons, cortical neurons were cultured (Cultured in Neurobasal medium supplemented with 2%B27 and cytarabine was used to inhibit glial cell growth) and cells were lysed for co-IP experiments (Fig.1C), and the results demonstrated the interaction between 14-3-3 and spastin within neurons.

      C- Both spastin and 14-3-3 appear to label the entire neuron with similar intensities throughout the entire cell which is rather unusual. Conditions of immunofluorescence should be improved and z-projections should be provided to support co-localization.

      Thanks for the comment. Our dual-labeling experiments indicated that 14-3-3 exhibits a characteristic pattern of whole-cell distribution. Therefore, this result cannot confirm the interaction between 14-3-3 and spastin within neurons, but it does provide evidence regarding the intracellular distribution patterns of 14-3-3 and spastin. Consequently, we supplemented neuronal endogenous co-IP experiments to further demonstrate the direct interaction between 14-3-3 and spastin within neurons, and we have modified the wording in the revised version accordingly.

      D- xx and yy axis information is either lacking or incomplete.

      We have made the corrections to the figures.

      E- It would be useful to show the conservation between the different 14-3-3 isoforms.

      We appreciate the suggestions. We have included a conservation analysis of 14-3-3 to assist readers in better understanding these results (Fig.1F).

      Figure 2:

      D- The experiment using a general protein kinase inhibitor does not allow concluding that the specific phosphorylation of spastin is sufficient for binding to 14-3-3. An alternative phosphorylated protein might be involved in the process.

      We appreciate the reviewer's consideration. We believe this serves as a prerequisite condition to demonstrate that "14-3-3 binding to spastin requires spastin phosphorylation." In fact, another project in our group has confirmed that CAMK II can mediate spastin phosphorylation, and the addition of staurosporine significantly reduces spastin phosphorylation levels (unpublished results). Here, we provide the western blot experiment showing the decrease in spastin phosphorylation under staurosporine treatment, with phosphorylation levels detected using the Pan Phospho antibody (Fig.2D).

      H and I- Pseudo-replication. Only independent experiments should be plotted and not data on multiple cells obtained in the same experiment. Please indicate the number of independent experiments.

      We appreciate the reviewer's correction. We now have included the mean value of three independent experiments and we have made relevant revisions to the statistical charts.

      Figure 3:

      The rationale for the hypothesis that spastin S233D transfection might upregulate the expression of spastin relative to WT and spastin S233A is unclear.

      We appreciate the reviewer's consideration. We have supplemented the relevant results, as depicted in the Fig.3G, which demonstrates that 14-3-3 can enhance the protein levels of spastin, and phosphorylated spastin (S233D) exhibits a significantly increased protein level compared to wild-type spastin. These findings indicate that 14-3-3 not only inhibits the degradation of spastin but also increases its protein levels.

      I- pseudo-replication. Please plot and do statistical analysis of independent experiments.

      Thank you for the reviewer's corrections. We have made the necessary revisions.

      Figure 4: E-J: I- pseudo-replication. Please plot and do statistical analysis of independent experiments.

      Thank you for the reviewer's corrections. We have made the necessary revisions.

      Figure 5:

      B- Please show individual data points.

      Thank you for the reviewer's corrections. We have made the necessary revisions.

      D- Longitudinal images of spinal cords where spastazoline was used cannot correspond to contusion as there is a very sharp discontinuity between the rostral and caudal spinal cord tissue. A full transection seems to have occurred. Alternatively, technical problems with tissue collection/preservation might have occurred.

      Thank you for the reviewer's consideration. The sharp discontinuity observed in the spastazoline group is not due to modeling issues but rather a result of the drug's effects on the injury site. This is primarily because spastin plays a crucial role not only in neuronal development but also in mitosis. Since the highly active proliferation of stromal cells at the injury site, . spastazoline may inhibit the proliferation of injury site-related stormal cells, thereby impeding the wound healing process following spinal cord injury, resulting in the observed discontinuous injury gap. We have made the corresponding revision accordingly.

      E- Images do not have the quality to allow analysis. 5HT staining should not be considered as a clear axonal labeling is not seen. This is also the case for neurofilament staining.

      We appreciate the concerns. While we did not present whole nerve fibers, we therefore employed NF and 5-HT immunoreactive fluorescence intensity as an indicator to assess the regeneration of nerve fibers as previously described, but not axons per square millimeter (Baltan S, et, al. J Neurosci. 2011 Mar 16;31(11):3990-9; Iwai M, et, al. Stroke. 2010 May;41(5):1032-7; Wang Y, et, al. Elife. 2018 Sep 12;7:e39016; Altmann C, et, al. Mol Neurodegeneration. 2016 Oct 22;11(1):69).

      Our results showed that in the spinal cord injury group, there was strongly decreased NF-positive stainning (with a slight increase in 5-HT). In contrast, our FC-A treatment group exhibited a significant higher abundance of NF-positive signals (or an increased 5-HT signal) in the lesion site, which also suggests the reparative effect of FC-A on nerves. We also intend to refine our immunohistochemical methods in future experiments.

      F- Images do not allow analysis. Higher magnifications are needed.

      Thank you for the reviewer's consideration. We have now included higher-magnification images (Fig.5M) to address this concern.

      Figure 7:

      Same issues as in Figure 5.

      A- Images do not have the quality to allow analysis. 5HT staining should not be considered as a clear axonal labeling is not seen.

      B- Images do not have the quality to allow analysis. Neurofilament staining should not be considered as clear axonal labeling is not seen. MBP staining does not have a pattern consistent with myelin staining

      We appreciate the concerns. While we did not present whole nerve fibers, we therefore employed NF and 5-HT immunoreactive fluorescence intensity as an indicator to assess the regeneration of nerve fibers as previously described, but not axons per square millimeter (Baltan S, et, al. J Neurosci. 2011 Mar 16;31(11):3990-9; Iwai M, et, al. Stroke. 2010 May;41(5):1032-7; Wang Y, et, al. Elife. 2018 Sep 12;7:e39016; Altmann C, et, al. Mol Neurodegeneration. 2016 Oct 22;11(1):69). In this study, sagittal slices were used. MBP covers the axonal surface, indicating its co-localization with the axons. However, as we did not present intact nerve fibers, so we were unable to show the typical myelin staining of MBP.

    1. Author Response

      Reviewer 1 (Public Review):

      1. With respect to the predictions, the authors propose that the subjects, depending on their linguistic background and the length of the tone in a trial, can put forward one or two predictions. The first is a short-term prediction based on the statistics of the previous stimuli and identical for both groups (i.e. short tones are expected after long tones and vice versa). The second is a long-term prediction based on their linguistic background. According to the authors, after a short tone, Basque speakers will predict the beginning of a new phrasal chunk, and Spanish speakers will predict it after a long tone.

      In this way, when a short tone is omitted, Basque speakers would experience the violation of only one prediction (i.e. the short-term prediction), but Spanish speakers will experience the violation of two predictions (i.e. the short-term and long-term predictions), resulting in a higher amplitude MMN. The opposite would occur when a long tone is omitted. So, to recap, the authors propose that subjects will predict the alternation of tone durations (short-term predictions) and the beginning of new phrasal chunks (long-term predictions).

      The problem with this is that subjects are also likely to predict the completion of the current phrasal chunk. In speech, phrases are seldom left incomplete. In Spanish is very unlikely to hear a function-word that is not followed by a content-word (and the opposite happens in Basque). On the contrary, after the completion of a phrasal chunk, a speaker might stop talking and a silence might follow, instead of the beginning of a new phrasal chunk.

      Considering that the completion of a phrasal chunk is more likely than the beginning of a new one, the prior endowed to the participants by their linguistic background should make us expect a pattern of results actually opposite to the one reported here.

      Response: We acknowledge the plausibility of the hypothesis advanced by Reviewer #1. We would like to further clarify the rationale that led us to predict that the hypothesized long-term predictions should manifest at the onset of (and not within) a “phrasal chunk”. The hypothesis does not directly concern the probability of a short event to follow a long one (or the other way around), which to our knowledge has not been systematically quantified in previous cross-linguistic studies. Rather, it concerns how the auditory system forms higher-level auditory chunks based on the rhythmic properties of the native language, which is what the previous behavioral studies on perceptual grouping have addressed (e.g., Iversen 2008; Molnar et al. 2014; Molnar et al. 2016). When presented with sequences of two tones alternating in duration, Spanish speakers typically report perceiving the auditory stream as a repetition of short-long chunks separated by a pause, while speakers of Basque usually report the opposite long-short grouping bias. These results suggest that the auditory system performs a chunking operation by grouping pairs of tones into compressed, higher-level auditory units (often perceived as a single event). The way two constituent tones are combined depends on linguistic experience. Based on this background, we hypothesized the presence of (i) a short-term system that merely encodes a repetition of alternations rule and predicts transitions from one constituent tone to the other (a → b → a → b, etc.); (ii) a long-term system that encodes a repetition of concatenated alternations rule and predicts transitions from one high-level unit to the other (ab → ab, etc.). Under this view, we expect predictions based on the long-term system to be stronger at the onset of (rather than within) high-level units and therefore omissions of the first constituent tone to elicit larger responses than omissions of the second constituent tone.

      In other words, the omission of the onset tone would reflect the omission of the whole chunk. On the other hand, the omission of the internal tone would be better handled by the short-term system, involved in processing the low-level structure of our sequences.

      A similar concern was also raised by Reviewer #2. We will include the view proposed by Reviewer #1 and Reviewer #2 in the updated version of the manuscript.

      1. The authors report an interaction effect that modulates the amplitude of the omission response, but caveats make the interpretation of this effect somewhat uncertain. The authors report a widespread omission response, which resembles the classical mismatch response (in MEG) with strong activations in sensors over temporal regions. Instead, the interaction found is circumscribed to four sensors that do not overlap with the peaks of activation of the omission response.

      Response: We appreciate that all three reviewers agreed on the robustness of the data analysis pipeline. The approach employed to identify the presence of an interaction effect was indeed conservative, using a non-parametric test on combined gradiometers data, no a priori assumptions regarding the location of the effect, and small cluster thresholds (cfg.clusteralpha = 0.05) to enhance the likelihood of detecting highly localized clusters with large effect sizes. This approach led to the identification of the cluster illustrated in Figure 2c, where the interaction effect is evident. The fact that this interaction effect arises in a relatively small cluster of sensors does not alter its statistical robustness. The only partial overlap of the cluster with the activation peaks might simply reflect the fact that distinct sources contribute to the generation of the omission-MMN, which has been demonstrated in numerous prior studies (e.g., Zhang et al., 2018; Ross & Hamm, 2020).

      Furthermore, the boxplot in Figure 2E suggests that part of the interaction effect might be due to the presence of two outliers (if removed, the effect is no longer significant). Overall, it is possible that the reported interaction is driven by a main effect of omission type which the authors report, and find consistently only in the Basque group (showing a higher amplitude omission response for long tones than for short tones). Because of these points, it is difficult to interpret this interaction as a modulation of the omission response.

      Response: The two participants mentioned by Reviewer #1, despite being somewhat distant from the rest of the group, are not outliers according to the standard Tukey’s rule. As shown in Author response image 1 below, no participant fell outside the upper (Q3+1.5xIQR) and lower whiskers (Q1-1.5xIQR) of the boxplot.

      Author response image 1.

      The presence of a main effect of omission type does not impact the interpretation of the interaction, especially considering that these effects emerge over distinct clusters of channels.

      The code to generate Author response image 1 and the corresponding statistics have been added to the script “analysis_interaction_data.R” in the OSF folder (https://osf.io/6jep8/).

      It should also be noted that in the source analysis, the interaction only showed a trend in the left auditory cortex, but in its current version the manuscript does not report the statistics of such a trend.

      Response: Our interpretation of the results for the present study is mainly driven by the effect observed on sensor-level data, which is statistically robust. The source modeling analyses (in non-invasive electrophysiology) provide a possible model of the candidate brain sources driving the effect observed at the sensor level. The source showing the interactive effect in our study is the left auditory cortex. More details and statistics will be provided in the reviewed version of the manuscript.

      Reviewer #2 (Public Review):

      1. Despite the evidence provided on neural responses, the main conclusion of the study reflects a known behavioral effect on rhythmic sequence perceptual organization driven by linguistic background (Molnar et al. 2016, particularly). Also, the authors themselves provide a good review of the literature that evidences the influence of long-term priors in neural responses related to predictive activity. Thus, in my opinion, the strength of the statements the authors make on the novelty of the findings may be a bit far-fetched in some instances.

      Response: We will consider the suggestion of reviewer #2 for the new version of the manuscript. Overall, we believe that the novelty of the current study lies in bridging together findings from two research fields - basic auditory neuroscience and cross-linguistic research - to provide evidence for a predictive coding model in the auditory that uses long-term priors to make perceptual inferences.

      1. Albeit the paradigm is well designed, I fail to see the grounding of the hypotheses laid by the authors as framed under the predictive coding perspective. The study assumes that responses to an omission at the beginning of a perceptual rhythmic pattern will be stronger than at the end. I feel this is unjustified. If anything, omission responses should be larger when the gap occurs at the end of the pattern, as that would be where stronger expectations are placed: if in my language a short sound occurs after a long one, and I perceptually group tone sequences of alternating tone duration accordingly, when I hear a short sound I will expect a long one following; but after a long one, I don't necessarily need to expect a short one, as something else might occur.

      Response: A similar point was advanced by Reviewer #1. We tried to clarify our hypothesis (see above). We will consider including this interpretation in the updated version of the manuscript.

      1. In this regard, it is my opinion that what is reflected in the data may be better accounted for (or at least, additionally) by a different neural response to an omission depending on the phase of an underlying attentional rhythm (in terms of Large and Jones rhythmic attention theory, for instance) and putative underlying entrained oscillatory neural activity (in terms of Lakatos' studies, for instance). Certainly, the fact that the aligned phase may differ depending on linguistic background is very interesting and would reflect the known behavioral effect.

      Response: We thank the reviewer for this comment, which is indeed very pertinent. Below are some comments highlighting our thoughts on this.

      1) We will explore in more detail the possibility that the aligned phase may differ depending on linguistic background, which is indeed very interesting. However, we believe that even if a phase modulation by language experience is found, it would not negate the possibility that the group differences in the MMN are driven by different long-term predictions. Rather, since the hypothesized phase differences would be driven by long-term linguistic experience, phase entrainment may reflect a mechanism through which long-term predictions are carried. On this point, we agree with the Reviewer when says that “this view would not change the impact of the results but add depth to their interpretation”.

      2) Related to the point above: Despite evoked responses and oscillations are often considered distinct electrophysiological phenomena, current evidence suggests that these phenomena are interconnected (e.g., Studenova et al., 2023). In our view, the hypotheses that the MMN reflects differences in phase alignment and long-term prediction errors are not mutually exclusive.

      3) Despite the plausibility of the view proposed by reviewer #2, many studies in the auditory neuroscience literature putatively consider the MMN as an index of prediction error (e.g., Bendixen et al., 2012; Heilbron and Chait, 2018). There are good reasons to believe that also in our study the MMN reflects, at least in part, an error response.

      In the updated version of the manuscript, we will include a paragraph discussing the possibility that the reported group differences in the omission MMN might be partially accounted for by differences in neural entrainment to the rhythmic sound sequences.

      Reviewer #3 (Public Review):

      The main weaknesses are the strength of the effects and generalisability. The sample size is also relatively small by today's standards, with N=20 in each group. Furthermore, the crucial effects are all mostly in the .01>P<.05 range, such as the crucial interaction P=.03. It would be nice to see it replicated in the future, with more participants and other languages. It would also have been nice to see behavioural data that could be correlated with neural data to better understand the real-world consequences of the effect.

      Response: We appreciate the positive feedback from Reviewer #3. Concerning this weakness highlighted: we agree with Reviewer #3 that it would be nice to see this study replicated in the future with larger sample sizes and a behavioral counterpart. Overall, we hope this work will lead to more studies using cross-linguistic/cultural comparisons to assess the effect of experience on neural processing. In the context of the present study, we believe that the lack of behavioral data does not undermine the main findings of this study, given the careful selection of the participants and the well-known robustness of the perceptual grouping effect (e.g., Iversen 2008; Yoshida et al., 2010; Molnar et al. 2014; Molnar et al. 2016). As highlighted by Reviewer #2, having Spanish and Basque dominant “speakers as a sample equates that in Molnar et al. (2016), and thus overcomes the lack of direct behavioral evidence for a difference in rhythmic grouping across linguistic groups. Molnar et al. (2016)'s evidence on the behavioral effect is compelling, and the evidence on neural signatures provided by the present study aligns with it.”

      References

      1. Bendixen, A., SanMiguel, I., & Schröger, E. (2012). Early electrophysiological indicators for predictive processing in audition: a review. International Journal of Psychophysiology, 83(2), 120-131.

      2. Heilbron, M., & Chait, M. (2018). Great expectations: is there evidence for predictive coding in auditory cortex?. Neuroscience, 389, 54-73.

      3. Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. The Journal of the Acoustical Society of America, 124(4), 2263-2271.

      4. Molnar, M., Lallier, M., & Carreiras, M. (2014). The amount of language exposure determines nonlinguistic tone grouping biases in infants from a bilingual environment. Language Learning, 64(s2), 45-64.

      5. Molnar, M., Carreiras, M., & Gervain, J. (2016). Language dominance shapes non-linguistic rhythmic grouping in bilinguals. Cognition, 152, 150-159.

      6. Ross, J. M., & Hamm, J. P. (2020). Cortical microcircuit mechanisms of mismatch negativity and its underlying subcomponents. Frontiers in Neural Circuits, 14, 13.

      7. Simon, J., Balla, V., & Winkler, I. (2019). Temporal boundary of auditory event formation: An electrophysiological marker. International Journal of Psychophysiology, 140, 53-61.

      8. Studenova, A. A., Forster, C., Engemann, D. A., Hensch, T., Sander, C., Mauche, N., ... & Nikulin, V. V. (2023). Event-related modulation of alpha rhythm explains the auditory P300 evoked response in EEG. bioRxiv, 2023-02.

      9. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition, 115(2), 356-361.

      10. Zhang, Y., Yan, F., Wang, L., Wang, Y., Wang, C., Wang, Q., & Huang, L. (2018). Cortical areas associated with mismatch negativity: A connectivity study using propofol anesthesia. Frontiers in Human Neuroscience, 12, 392.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Watanuki et al used metabolomic tracing strategies of U-13C6-labeled glucose and 13C-MFA to quantitatively identify the metabolic programs of HSCs during steady-state, cell-cycling, and OXPHOS inhibition. They found that 5-FU administration in mice increased anaerobic glycolytic flux and decreased ATP concentration in HSCs, suggesting that HSC differentiation and cell cycle progression are closely related to intracellular metabolism and can be monitored by measuring ATP concentration. Using the GO-ATeam2 system to analyze ATP levels in single hematopoietic cells, they found that PFKFB3 can accelerate glycolytic ATP production during HSC cell cycling by activating the rate-limiting enzyme PFK of glycolysis. Additionally, by using Pfkfb3 knockout or overexpressing strategies and conducting experiments with cytokine stimulation or transplantation stress, they found that PFKFB3 governs cell cycle progression and promotes the production of differentiated cells from HSCs in proliferative environments by activating glycolysis. Overall, in their study, Watanuki et al combined metabolomic tracing to quantitatively identify metabolic programs of HSCs and found that PFKFB3 confers glycolytic dependence onto HSCs to help coordinate their response to stress. Even so, several important questions need to be addressed as below:

      We sincerely appreciate the constructive feedback from the reviewer. Additional experiments and textual improvements have been made to the manuscript based on your valuable suggestions. In particular, the major revisions are as follows: First, we investigated the extent to which other metabolites, not limited to the glycolytic system, affect metabolism in HSCs after 5-FU treatment. Second, the extent to which PFKFB3 contributes to the expansion of the HSPC pool in the bone marrow was adjusted to make the description more accurate based on the data. Finally, we overexpressed PFKFB3 in HSCs derived from GO-ATeam2 mice and confirmed that PRMT1 inhibition did not reduce the ATP concentration. We believe that the reviewer's valuable comments have further deepened our knowledge of the significance of glycolytic activation by PFKFB3 that we have demonstrated. Our response to the "Recommendations for Authors" is listed first, followed by our responses to all "Public Review" comments as follows:

      (Recommendations For The Authors):

      1. The methods used in key experiments should be described in more detail. For example, in the section on ‘Conversion of GO-ATeam2 fluorescence to ATP concentration’, the knock-in strategy for GO-ATeam2 should be described, as well as U-13C6 -glucose tracer assays.

      As per your recommendation, we have described the key experimental method in more detail in the revised manuscript: the GO-ATeam2 knock-in method was reported by Yamamoto et al. 1. Briefly, they used a CAG promoter-based knock-in strategy targeting the Rosa26 locus to generate GO-ATeam2 knock-in mice. A description of the method has been added to Methods and the reference has been added to the citation.

      For the U-13C6-glucose tracer analysis, the following points were added to describe the details of the analysis: First, a note was added that the number of cells used for the in vitro tracer analysis was the number of cells used for each sample. Second, we added the solution from which the cells were collected by sorting. We added that the incubation was performed under 1% O2 and 5% CO2.

      1. Confusing image label of Supplemental Figure 1H should be corrected in line 253.

      We have corrected the incorrect figure caption on line 217 in the revised manuscript to "Supplemental Figure 1N" as you suggested.

      1. The percentage of the indicated cell population should also be shown in Figure S1B.

      As you indicated, we have included the percentages for each population in Supplemental Figure 1B.

      Author response image 1.

      1. Please pay attention to the small size of the marks in the graph, such as in Figure S1F and so on.

      As you indicated, we have corrected the very small text contained in Figure S1F. Similar corrections have been made to Figures S1B and S5A.

      1. Please pay attention to the label of line in Figure S6A-D.

      Thank you very much for the advice. We have added line labels to the graph in the original Figures S6A–D.

      (Specific comments)

      1. Based on previous reports, the authors expanded the LSK gate to include as many HSCs as possible (Supplemental Figure 1B). However, while they showed the gating strategy on Day 6 after 5-FU treatment, results from other time-points should also be displayed to ensure the strict selection of time-points.

      Thank you for pointing this out. First, we did not enlarge the Sca-1 gating in this study. We apologize for any confusion caused by the incomplete description. The gating of c-Kit is based on that shown by Umemoto et al (Figure EV1A) 2, who used 250 mg/kg 5-FU, so their c-Kit reduction is more pronounced than ours.

      We followed this study and compared c-Kit expression in Lin-Sca-1+CD150+CD48-EPCR+ gates to BMMNCs on day 6 after 5-FU administration (150 mg/kg). The results are shown below.

      Author response image 2.

      >

      Since the MFI of c-Kit was downregulated, we used gating that extended the c-Kit gate to lower-expression regions on day 6 after 5-FU administration (revised Figure S1C). At other time points, LSK gating was the same as in the PBS-treated group, as noted in the Methods.

      1. In Figure 1, the authors examined the metabolite changes on Day 6 after 5-FU treatment. However, it is important to consider whether there are any dynamic adjustments to metabolism during the early and late stages of 5-FU treatment in HSCs compared to PBS treatment, in order to coordinate cell homeostasis despite no significant changes in cell cycle progression at other time-points.

      Thank you for pointing this out. Below are the results of the GO-ATeam2 analysis during the very early phase (day 3) and late phase (day 15) after 5-FU administration (revised Figures S7A–H).

      Author response image 3.

      In the very early phase, such as day 3 after 5-FU administration, cell cycle progression had not started (Figure S1C) and was not preceded by metabolic changes. Meanwhile, in the late phase, such as day 15 after 5-FU administration, the cell cycle and metabolism returned to a steady state. In summary, the timing of the metabolic changes coincided with that of cell cycle progression. This point is essential for discussing the cell cycle-dependent metabolic system of HSCs and has been newly included in the Results (page 11, lines 321-323).

      1. As is well known, ATP can be produced through various pathways, including glycolysis, the TCA cycle, the PPP, NAS, lipid metabolism, amino acid metabolism and so on. Therefore, it is important to investigate whether treatment with 5-FU or oligomycin affects these other metabolic pathways in HSCs.

      As the reviewer pointed out, ATP production by systems other than the glycolytic system of HSCs is also essential. In this revised manuscript, we examined the effects of the FAO inhibitor (Etomoxir, 100 µM) and the glutaminolysis inhibitor 6-diazo-5-oxo-L-norleucine (DON, 2mM) alone or in combination on the ATP concentration of HSCs after PBS or 5-FU treatment. As shown below, there was no apparent decrease in ATP concentration (revised Figures S7J–M).

      Author response image 4.

      Fatty acid β-oxidation activity was also measured in 5-FU-treated HSCs using the fluorescent probe FAOBlue and was unchanged compared to PBS-treated HSCs (revised Figure S7N).

      Author response image 5.

      Notably, the addition of 100 µM etomoxir plus glucose and Pfkfb3 inhibitors resulted in a rapid decrease in ATP concentration in HSCs (revised Figures S7O–P). This indicates that etomoxir partially mimics the effect of oligomycin, suggesting that at a steady state, OXPHOS is driven by FAO, but can be compensated by the acceleration of the glycolytic system by Pfkfb3. Meanwhile, the exposure of HSCs to Pfkfb3 inhibitors in addition to 2 mM DON, which is an extremely high dose considering that the Ki value of DON for glutaminase is 6 µM, did not reduce ATP (revised Figures S7O–P). This suggests that ATP production from glutaminolysis is limited in HSCs at a steady state.

      Author response image 6.

      These points suggest that OXPHOS is driven by fatty acids at a steady state, but unlike the glycolytic system, FAO is not further activated by HSCs after 5-FU treatment. The results of these analyses and related descriptions are included in the revised manuscript (page 11, lines 332-344).

      1. In part 2, they showed that oligomycin treatment of HSCs exhibited activation of the glycolytic system, but what about the changes in ATP concentration under oligomycin treatment? Are other metabolic systems affected by oligomycin treatment?

      Thank you for your thoughtful comments. The relevant results we have obtained so far with the GO-ATeam2 system are as follows: First, OXPHOS inhibition in the absence of glucose significantly decreases the ATP concentration of HSCs (Figure 4C). Meanwhile, OXPHOS inhibition in the presence of glucose maintains the ATP concentration of HSCs (Figure 5B). Since it is difficult to imagine a completely glucose-free environment in vivo, it is thought that ATP concentration is maintained by the acceleration of the glycolytic system even under hypoxic or other conditions that inhibit OXPHOS.

      Meanwhile, glucose tracer analysis shows that OXPHOS inhibition suppresses nucleic acid synthesis (NAS) except for the activation of the glycolytic system (Figures 2C–F). This is because phosphate groups derived from ATP are transferred to nucleotide mono-/di-phosphate in NAS, but OXPHOS, the main source of ATP production, is impaired, along with the enzyme conjugated with OXPHOS in the process of NAS (dihydroorotate dehydrogenase, DHODH). We have added a new paragraph in the Discussion section (page 17, lines 511-515) to provide more insight to the reader by summarizing and discussing these points.

      1. In Figure 5M, it would be helpful to include a control group that was not treated with 2-DG. Additionally, if Figure 5L is used as the control, it is unclear why the level of ATP does not show significant downregulation after 2-DG treatment. Similarly, in Figure 5O, a control group with no glucose addition should be included.

      Thank you for your advice. The experiments corresponding to the control groups in Figures 5M and O were in Figures 5L and N, respectively, but we have combined them into one graph (revised Figures 5L–M). The results more clearly show that PFKFB3 overexpression enhances sensitivity to 2-DG, but also enhances glycolytic activation upon oligomycin administration.

      Author response image 7.

      1. In this study, their findings suggest that PFKFB3 is required for glycolysis of HSCs under stress, including transplantation. In Figure 7B, the results showed that donor-derived chimerism in PB cells decreased relative to that in the WT control group during the early phase (1 month post-transplant) but recovered thereafter. Although the transplantation cell number is equal in two groups of donor cells, it is unclear why the donor-derived cell count decreased in the 2-week post-transplantation period and recovered thereafter in the Pfkgb3 KO group. Therefore, they should provide an explanation for this. Additionally, they only detected the percentage of donor-derived cells in PB but not from BM, which makes it difficult to support the argument for Increasing the HSPC pool.

      As pointed out by the reviewer, it is interesting to note that the decrease in peripheral blood chimerism in the PFKFB3 knockout is limited to immediately after transplantation and then catches up with the control group (Figure 7B). We attribute this to the fact that HSPC proliferation is delayed immediately after transplantation in PFKFB3 deficiency, but after a certain time, PB cells produced by the delayed proliferating HSPCs are supplied. In support of this, the PFKFB3 knockout HSPCs did not exhibit increased cell death after transplantation (Figure 7K), while a delayed cell cycle was observed (Figures 7G–J). A description of this point has been added to the Discussion (page 19, lines 573-579).

      In addition, the knockout efficiency in bone marrow cells could not be verified because the number of cells required for KO efficiency analysis was not available. Therefore, we have added a statement on this point and have toned down our overall claim regarding the extent to which PFKFB3 is involved in the expansion of the HSPC pool (page 15, lines 474-476).

      1. In Figure 7E, they collected the BM reconstructed with Pfkfb3- or Rosa-KO HSPCs two months after transplantation, and then tested their resistance to 5-FU. However, the short duration of the reconstruction period makes it difficult to draw conclusions about the effects on steady-state blood cell production.

      We agree that we cannot conclude from this experiment alone that PFKFB3 is completely unnecessary in steady state because, as you pointed out, the observation period of the experiment in Figure 7E is not long. We have toned down the claim by stating that PFKFB3 is only less necessary in steady-state HSCs compared to proliferative HSCs (page 15, lines 460-461).

      1. PFK is allosterically activated by PFKFB, and other members of the PFKFB family could also participate in the glycolytic program. Therefore, they should investigate their function in contributing to glycolytic plasticity in HSCs during proliferation. Additionally, they should also analyze the protein expression and modification levels of other members. Although PFKFB3 is the most favorable for PFK activation, the role of other members should also be explored in HSC cell cycling to provide sufficient reasoning for choosing PFKFB3.

      To further justify why we chose PFKFB3 among the PFKFB family members, we reviewed our data and the publicly available Gene Expression Commons (GEXC) 3. PFKFB3 is the most highly expressed member of the PFKFB family in HSCs (revised Figure 4F), and its expression increases with proliferation (Author response image 9). In addition to this, we have also cited the literature 4 indicating that AZ PFKFB3 26 is a Pfkfb3-specific inhibitor that we used in this paper, and added a note to this point (that it is specific) (page 11, lines 327-329). Through these revisions, we sought to strengthen the rationale for Pfkfb3 as the primary target of the analysis.

      Author response image 8.

      Author response image 9.

      1. In this study, the authors identified PRMT1 as the upstream regulator of PFKFB3 that is involved in the glycolysis activation of HSCs. However, PRMT1 is also known to participate in various transcriptional activations. Thus, it is important to determine whether PRMT1 affects glycolysis through transcriptional regulation or through its direct regulation of PFKFB3? Additionally, the authors should investigate whether PRMT1i inhibits ATP production in normal HSCs. Moreover, could we combine Figure 6I and 6J for analysis. Finally, the authors could conduct additional rescue experiments to demonstrate that the effect of PRMT1 inhibitors on ATP production can be rescued by overexpression of PFKFB3.

      Although PRMT1 inhibition reduced m-PFKFB3 levels in HSCs, 5-FU treatment also reduced or did not alter Pfkfb3 transcript levels (Figures 6B, G) and the expression of genes such as Hoxa7/9/10, Itga2b, and Nqo1, which are representative transcriptional targets of PRMT1, in proliferating HSCs after 5-FU treatment (revised Figure S9).

      Author response image 10.

      These results suggest that PRMT1 promotes PFKFB3 methylation, which increases independently of transcription in HSCs after 5-FU treatment.

      A summary analysis of the original Figures 6I and 6J is shown below (revised Figure 6I).

      Author response image 11.

      Finally, we tested whether the inhibition of the glycolytic system and the decrease in ATP concentration due to PRMT1 inhibition could be rescued by the retroviral overexpression of PFKFB3. We found that PFKFB3 overexpression did not decrease the ATP concentration in HSCs due to PRMT1 inhibition (revised Figure 6J). Therefore, PFKFB3 overexpression mitigated the decrease in ATP concentration caused by PRMT1 inhibition. These data and related statements have been added to the revised manuscript (page 14, lines 427-428).

      Author response image 12.

      Reviewer #2:

      In the manuscript Watanuki et al. want to define the metabolic profile of HSCs in stress/proliferative (myelosuppression with 5-FU), and mitochondrial inhibition and homeostatic conditions. Their conclusions are that during proliferation HSCs rely more on glycolysis (as other cell types) while HSCs in homeostatic conditions are mostly dependent on mitochondrial metabolism. Mitochondrial inhibition is used to demonstrate that blocking mitochondrial metabolism results in similar features of proliferative conditions.

      The authors used state-of-the-art technologies that allow metabolic readout in a limited number of cells like rare HSCs. These applications could be of help in the field since one of the major issues in studying HSCs metabolism is the limited sensitivity of the“"standard”" assays, which make them not suitable for HSC studies.

      However, the observations do not fully support the claims. There are no direct evidence/experiments tackling cell cycle state and metabolism in HSCs. Often the observations for their claims are indirect, while key points on cell cycle state-metabolism, OCR analysis should be addressed directly.

      We sincerely appreciate the reviewer's constructive comments. Thank you for highlighting the importance of the highly sensitive metabolic assay developed in this study and the findings based on it. Meanwhile, the reviewer's comments have made us aware of areas where we can further improve this manuscript. In particular, in the revised manuscript, we have performed further studies to demonstrate the link between the cell cycle and metabolic state. Specifically, we further subdivided HSCs by the uptake of in vivo-administered 2-NBDG and performed cell cycle analysis. Next, HSCs after PBS or 5-FU treatment were analyzed by a Mito Stress test using the Seahorse flux analyzer, including ECAR and OCR, and a more direct relationship between the cell cycle state and the metabolic system was found. We believe that the reviewer's valuable suggestions have helped us clarify more directly the importance of the metabolic state of HSCs in response to cell cycle and stress that we wanted to show and emphasize the usefulness of the GO-ATeam2 system. Our response to "Recommendations For The Authors" is listed first, followed by our responses to all comments in "Public Review" as follows:

      (Recommendations For The Authors):

      In general, I believe it would be important:

      1. to directly associate cell cycle state with metabolic state. For example, by sorting HSC (+/- 5FU) based on their cell cycle state (exploiting the mouse model presented in the manuscript or by defining G0/G1/G2-S-M via Pyronin/Hoechst staining which allow to sort live cells) and follow the fate of radiolabeled glucose.

      Thank you for raising these crucial points. Unfortunately, it was difficult to perform the glucose tracer analysis by preparing HSCs with different cell cycle states as you suggested due to the amount of work involved. In particular, in the 5-FU group, more than 60 mice per group were originally required for an experiment, and further cell cycle-based purification would require many times that number of mice, which we felt was unrealistic under current technical standards. As an alternative, we administered 2-NBDG to mice and fractionated HSCs at the 2-NBDG fluorescence level for cell cycle analysis. The results are shown below (revised Figure S1M). Notably, even in the PBS-treated group, HSCs with high 2-NBDG uptake were more proliferative than those with low 2-NBDG uptake and are comparable to HSCs after 5-FU treatment, although the overall population of HSCs exiting the G0 phase and entering the G1 phase increased after 5-FU treatment. In both PBS/5-FU-treated groups, these large differences in cell cycle glucose utilization suggest a direct link between HSC proliferation and glycolysis activation. If a more sensitive type of glucose tracer analysis becomes available in the future, it may be possible to directly address the reviewer's comments. We see this as a topic for the future. The descriptions of the above findings and perspectives have been added to the Results and Discussion section (page 7, lines 208-214, page 20, lines 607-610).

      Author response image 13.

      1. Use other radio labeled substrates (fatty acid, glutamate)

      Thank you very much for your suggestion. While this is an essential point for future studies, we believe it is not the primary focus of the paper. We are planning another research project on tracer analysis using labeled fatty acids and glutamates, which we will report on in the near future. We have clearly stated in the Abstract and Introduction of the revised manuscript, that the focus of this study is on changes in glucose metabolism when HSCs are stressed (page 3, line 75 and 87, page 5, lines 135).

      Instead, we added the following analyses of metabolic changes in fatty acids and glutamate using the GO-ATeam2 system. HSCs derived from GO-ATeam2 mice treated with PBS or 5-FU were used to measure changes in ATP concentrations after exposure to the fatty acid beta-oxidation (FAO) inhibitor etomoxir and the glutaminolysis inhibitor 6-diazo-5-oxo-L-norleucine (DON). Etomoxir was used at 100 µM, a concentration that inhibits FAO without inhibiting mitochondrial electron transfer complex I, as previously reported 5. DON was used at 2 mM, a concentration that sufficiently inhibits the enzyme as the Ki for glutaminase is 6 µM. In this experiment, etomoxir alone, DON alone, or etomoxir and DON in combination did not decrease the ATP concentration of HSCs in the PBS and 5-FU groups (revised Figures S7J–M), suggesting that FAO and glutaminolysis were not essential for ATP production in HSCs in the short term. Thus, according to the analysis using the GO-Ateam2 system, HSCs exposed to acute stresses change the efficiency of glucose utilization (accelerated glycolytic ATP production) rather than other energy sources. Since there are reports that FAO and glutaminolysis are required for HSC maintenance in the long term 5,6, compensatory pathways may be able to maintain ATP levels in the short term. A description of these points has been added to the Discussion (page 11, lines 332-344).

      Author response image 14.

      1. Include OCR analyses.

      In addition to the ECAR data of the Mito Stress test (original Figures 2G–H), OCR data were added to the revised manuscript (revised Figures 2H, S3D). Compared to c-Kit+ myeloid progenitors (LKS- cells), HSC showed a similar increase in ECAR, while the decrease in OCR was relatively limited. A possible explanation for this is that glycolytic and mitochondrial metabolism are coupled in c-Kit+ myeloid progenitors, whereas they are decoupled in HSCs. This is also suggested by the glucose plus oligomycin experiment in Figures 5B, C, and S6A–D (orange lines). In summary, in HSCs, glycolytic and mitochondrial ATP production are decoupled and can maintain ATP levels by glycolytic ATP production alone, whereas in progenitors including GMPs, the two ATP production systems are constantly coupled, and glycolysis alone cannot maintain ATP concentration. We have added descriptions of these points in the Results and Discussion section (page 8, lines 240-243, page 18, lines 558-561).

      Author response image 15.

      Next, a Mito Stress test was performed using HSCs derived from PBS- or 5-FU-treated mice in the presence or absence of oligomycin (revised Figures 1G–H, S3A–B). Without oligomycin treatment, ECAR in 5-FU-treated HSCs was higher than in PBS-treated HSCs, and OCR was unchanged. Oligomycin treatment increased ECAR in both PBS- and 5-FU-treated HSCs, whereas OCR was unchanged in PBS-treated HSCs, but significantly decreased in 5-FU-treated HSCs. Changes in ECAR in response to oligomycin differed between HSC proliferation or differentiation: ECAR increased in 5-FU-treated HSCs but not in LKS- progenitors (original Figures 2G–H). This suggests a metabolic feature of HSCs in which the coupling of OXPHOS with glycolysis seen in LKS- cells is not essential in HSCs even after cell cycle entry. The results and discussion of this experiment have been added to page 7, lines 194-201 and page 18, lines 558-561).

      Author response image 16.

      1. Correlate proliferation-mitochondrial inhibition-metabolic state

      We agree that it is important to clarify this point. First, OXPHOS inhibition and proliferation similarly accelerate glycolytic ATP production with PFKFB3 (Figures 4G, I, and 5F–I). Meanwhile, oligomycin treatment rapidly decreases ATP in HSCs with or without 5-FU administration (Figure 4C). These results suggest that OXPHOS is a major source of ATP production both at a steady state and during proliferation, even though the analysis medium is pre-saturated with hypoxia similar to that in vivo. This has been added to the Discussion section (page 17, lines 520-523).

      1. Tune down the claim on HSCs in homeostatic conditions since from the data it seems that HSCs rely more on anaerobic glycolysis.

      Thanks for the advice. The original Figures S2C, D, F, and G show that HSC is dependent on the anaerobic glycolytic system even at a steady state, so we have toned down our claims (page 7, lines 192-194).

      1. For proliferative HSCs mitochondrial are key. When you block mitochondria with oligomycin there's the biggest drop in ATP.

      In the revised manuscript, we have tried to highlight the key findings that you have pointed out. First, we mentioned in the Discussion (page 17, lines 523-525) that previous studies suggested the importance of mitochondria in proliferating HSCs. Meanwhile, the GO-ATeam2 and glucose tracer analyses in this study newly revealed that the glycolytic system activated by PFKFB3 is activated during the proliferative phase, as shown in Figure 4C. We also confirmed that mitochondrial ATP production is vital in proliferating HSCs, and we hope to clarify the balance between ATP-producing pathways and nutrient sources in future studies.

      1. To better clarify this point authors, authors should do experiments in hypoxic conditions and compare it to oligomycin treatment and showing that mito-inhibition acts differently on HSCs (considering that all these drugs are toxic for mitochondria and induce rapidly stress responses ex: mitophagy).

      We apologize for any confusion caused by not clearly describing the experimental conditions. As pointed out by the reviewer, we also recognize the importance of experiments in a hypoxic environment. All GO-ATeam2 analyses were performed in a medium saturated sufficiently under hypoxic conditions and analyzed within minutes, so we believe that the medium did not become oxygenated (page S5-S6, lines 160-163 in the Methods). Despite being conducted under such hypoxic conditions, the substantial decrease in ATP after oligomycin treatment is intriguing (original Figures 4C, 5B, 5C). The p50 value of mitochondria (the partial pressure of oxygen at which respiration is half maximal) is 0.1 kPa, which is less than 0.1% of the oxygen concentration at atmospheric pressure 7. Thus, biochemically, it is consistent that OXPHOS can maintain sufficient activity even in a hypoxic environment like the bone marrow. We are currently embarking on a study to determine ATP concentration in physiological hypoxic conditions using in vivo imaging within the bone marrow, which we hope to report in a separate project. We have discussed these points, technical limitations, and perspectives in the Discussion section (page 20, lines 610-612).

      • In Figure 1 C, D, E and F, the comparison should be done as unpaired t test and the control group should not be 1 as the cells comes from different individuals.

      Thank you very much for pointing this out. We have reanalyzed and revised the figures (revised Figures 1C–F)

      Author response image 17.

      • In Figure S2A, the post-sorting bar of 6PG, R5P and S7P are missing.

      Metabolites below the detection threshold (post-sorting samples of 6PG, R5P, and S7P) are now indicated as N.D. (not detected) (revised Figure S2A).

      Author response image 18.

      • In the 2NBDG experiments, authors should add the appropriate controls, since it has been shown that 2NBDG cellular uptake do not correctly reflect glucose uptake (Sinclair LV, Immunometabolism 2020) (a cell type dependent variations) thus inhibitors of glucose transporters should be added as controls (cytochalasin B; 4,6-O-ethylidene-a-D-glucose) it would be quite challenging to test it in vivo but it would be sufficient to show that in vitro in the different HSPCs analyzed.

      We appreciate the essential technical point raised by the reviewer. In the revised manuscript, we performed a 2-NBDG assay with cytochalasin B and phloretin as negative controls. After PBS treatment, 2-NBDG uptake was higher in 5-FU-treated HSCs compared to untreated HSCs. This increase was inhibited by both cytochalasin B and phloretin. In PBS-treated HSCs, cytochalasin B did not downregulate 2-NBDG uptake, whereas phloretin did. Although cytochalasin B inhibits glucose transporters (GLUTs), it is also an inhibitor of actin polymerization. Therefore, its inhibitory effect on GLUTs may be weaker than that of phloretin. We have revised the figure (revised Figure S1L) and added the corresponding description (page 7, lines 207-208).

      Author response image 19.

      • S5C: authors should show the cell number for each population. If there's a decreased in % in Lin- that will be reflected in all HSPCs. Comparing the proportion of the cells doesn't show the real impact on HSPCs.

      Thank you for your insightful point. In the revision, we compared the numbers, not percentages, of HSPCs and found no difference in the number of cells in the major HSPC fractions in Lin-. The figure has been revised (revised Figure S6C) and the corresponding description has been added (page 10, lines 296-299).

      Author response image 20.

      Minor:

      1. In S1 F-G is not indicated in which day post 5FU injection is done the analysis. I assume on day 6 but it should be indicated in the figure legend and/or text.

      Thank you for pointing this out. As you assumed, the analysis was performed on day 6. The description has been added to the legend of the revised Figure S1G.

      1. S1K is not described in the text. What are proliferative and quiescence-maintaining conditions? The analyses are done by flow using LKS SLAM markers after culture? How long was the culture?

      Thank you for your comments. First, the figure citation on line 250 was incorrect and has been corrected to Figure S1N. Regarding the proliferative and quiescence-maintaining conditions, we have previously reported on these 8. In brief, these are culture conditions that maintain HSC activity at a high level while allowing for the proliferation or maintenance of HSCs in quiescence, achieved by culturing under fatty acid-rich, hypoxic conditions with either high or low cytokine concentrations. Analysis was performed after one week of culture, with the HSC number determined by flow cytometry based on the LSK-SLAM marker. While these are mentioned in the Methods section, we have added a description in the main text to highlight these points for the reader (page 7, lines 214-217).

      1. In Figure 5G, why does the blue line (PFKFB3 inhibitor) go up in the end of the real-time monitoring? Does it mean that other compensatory pathway is turned on?

      As you have pointed out, we cannot rule out the possibility that other unknown compensatory ATP production pathways were activated. We have added a note in the Discussion section to address this (page 18, lines 555-556).

      1. In Figure S6H&J, the reduction is marginal. Does it mean that PKM2 is not important for ATP production in HSCs?

      The activity of the inhibitor is essential in the GO-ATeam2 analysis. The commercially available PKM2 inhibitors have a higher IC50 value (IC50 = 2.95 μM in this case). Nevertheless, the effect of reducing the ATP concentration was observed in progenitor cells, but not in HSCs. The report by Wang et al. 9 on the analysis using a PKM2-deficient model suggests a stronger effect on progenitor cells than on HSCs. Our results are similar to those of the previous report.

      (Specific comments)

      Specifically, there are several major points that rise concerns about the claims:

      1. The gating strategy to select HSCs with enlarged Sca1 gating is not convincing. I understand the rationale to have a sufficient number of cells to analyze, however this gating strategy should be applied also in the control group. From the FACS plot seems that there are more HSCs upon 5FU treatment (Figure S1b). How that is possible? Is it because of the 20% more of cycling cells at day 6? To prove that this gating strategy still represents a pure HSC population, authors should compare the blood reconstitution capability of this population with a "standard" gated population. If the starting population is highly heterogeneous then the metabolic readout could simply reflect cell heterogeneity.

      Thank you for pointing this out. First, we did not enlarge the Sca-1 gating in this study. We apologize for any confusion caused by the incomplete description. The gating of c-Kit is based on that shown by Umemoto et al (Figure EV1A) 2, who used 250 mg/kg 5-FU, so their c-Kit reduction is more pronounced than ours.

      We followed this study and compared c-Kit expression in the Lin-Sca-1+CD150+CD48-EPCR+ gates to BMMNCs on day 6 after 5-FU administration (150 mg/kg). The results are shown below.

      Author response image 21.

      Since the MFI of c-Kit was downregulated, we used gating that extended the c-Kit gate to lower expression regions on day 6 after 5-FU administration (revised Figure S1C).

      At other time points, LSK gating was the same as in the PBS-treated group, as noted in the Methods.

      The reason why the number of HSCs appears to be higher in the 5-FU group is because most of the differentiated blood cells were lost due to 5-FU administration and the same number of cells as in the PBS group were analyzed by FACS, resulting in a relatively higher number of HSCs. The legend of Figure S1 shows that the number of HSCs in both the PBS and 5-FU groups appeared to increase because the same number of BMMNCs was obtained at the time of analysis (page S22, lines 596-598).

      Regarding cellular heterogeneity, from a metabolic point of view, the heterogeneity in HSCs is rather reduced by 5-FU administration. As shown in Figure S3A–C, this is simulated under stress conditions, such as after 5-FU administration or during OXPHOS inhibition, where the flux variability in each enzymatic reaction is significantly reduced. GO-ATeam2 analysis after 5-FU treatment showed no increase in cell population variability. After 2-DG treatment, ATP concentrations in HSCs were widely distributed from 0 mM to 0.8 mM in the PBS group, while more than 80% of those in the 5-FU group were less than 0.4 mM (Figures 4B, D). HSCs may have a certain metabolic diversity at a steady state, but under stress conditions, they may switch to a more specialized metabolism with less cellular heterogeneity in order to adapt.

      1. S2 does not show major differences before and after sorting. However, a key metabolite like Lactate is decreased, which is also one of the most present. Wouldn't that mean that HSCs once they move out from the hypoxic niche, they decrease lactate production? Do they decrease anaerobic glycolysis? How can quiescent HSC mostly rely on OXPHOS being located in hypoxic niche?

      2. Since HSCs in the niche are located in hypoxic regions of the bone marrow, would that not mimic OxPhos inhibition (oligomycin)? Would that not mean that HSCs in the niche are more glycolytic (anaerobic glycolysis)?

      3. In Figure 5B, the orange line (Glucose+OXPHOS inhibition) remains stable, which means HSCs prefer to use glycolysis when OXPHOS is inhibited. Which metabolic pathway would HSCs use under hypoxic conditions? As HSCs resides in hypoxic niche, does it mean that these steady-state HSCs prefer to use glycolysis for ATP production? As mentioned before, mitochondrial inhibition can be comparable at the in vivo condition of the niche, where low pO2 will "inhibit" mitochondria metabolism.

      Thank you for the first half of comment 2 on the technical features of our approach. First, as you have pointed out, there is minimal variation and stable detection of many metabolites before and after sorting (Figure S2A), suggesting that isolation from the hypoxic niche and sorting stress do not significantly alter metabolite detection performance. This is consistent with a previous report by Jun et al. 10. Meanwhile, lactate levels decreased by sorting. Therefore, if the activity of anaerobic glycolysis was suppressed in stressed HSCs, it may be difficult to detect these metabolic changes with our tracer analysis. However, in this study, several glycolytic metabolites, including an increase in lactate, were detected in HSCs from 5-FU-treated mice compared with HSCs from PBS-treated mice that were similarly sorted and prepared, suggesting an increase in glycolytic activity. In other words, we may have been fortunate to detect the stress-induced activation of the glycolytic system beyond the characteristic of our analysis system that lactate levels tend to appear lower than they are. Given that damage to the bone marrow hematopoiesis tends to alleviate the low-oxygen status of the niche 11, we postulate that this upregulated aerobic glycolysis arises intrinsically in HSCs rather than from external conditions.

      The second half of comment 2, and comments 7 and 10, are essential and overlapping comments and will be answered together. Although genetic analyses have shown that HSCs produce ATP by anaerobic glycolysis in low-oxygen environments 9,12, our GO-ATeam2 analysis in this study confirmed that HSCs also generate ATP via mitochondria. This is also supported by Ansó's prior findings where the knockout of the Rieske iron–sulfur protein (RISP), a constituent of the mitochondrial electron transport chain, impairs adult HSC quiescence and bone marrow repopulation 13. Bone marrow is a physiologically hypoxic environment (9.9–32.0 mmHg 11). However, the p50 value of mitochondria (the partial pressure of oxygen at which respiration is half maximal) is below 0.1% oxygen concentration at atmospheric pressure (less than 1 mmHg) 7. This suggests that OXPHOS can retain sufficient activity even under physiologically hypoxic conditions. We are currently initiating efforts to discern ATP concentrations in vivo within the bone marrow under physiological hypoxia. This will be reported in a separate project in the future. Admittedly, when we began this research, we did not anticipate the significant mitochondrial reliance of HSCs. As we previously reported, the metabolic uncoupling of glycolysis and mitochondria 12 may enable HSCs to activate only glycolysis, and not mitochondria, under stress conditions such as post-5-FU administration, suggesting a unique metabolic trait of HSCs. We have included these technical limitations and perspectives in the Discussion section (page 17, lines 520-523).

      1. The authors performed challenging experiments to track radiolabeled glucose, which are quite remarkable. However, the data do not fully support the conclusions. Mitochondrial metabolism in HSCs can be supported by fatty acid and glutamate, thus authors should track the fate of other energy sources to fully discriminate the glycolysis vs mito-metabolism dependency. From the data on S2 and Fig1 1C-F, the authors can conclude that upon 5FU treatment HSCs increase glycolytic rate.

      2. FIG.2B-C: Increase of Glycolysis upon oligomycin treatment is common in many different cell types. As explained before, other radiolabeled substrates should be used to understand the real effect on mitochondria metabolism.

      Thank you for your suggestion. While this is essential for future studies, we believe it is not the primary focus of the paper. We are planning another research project on tracer analysis using labeled fatty acids and glutamates, which we will report on in the near future. We have clearly stated in the Abstract and Introduction of the revised manuscript that the focus of this study is on changes in glucose metabolism when HSCs are stressed (page 3, line 75 and 87, page 5, lines 135).

      Instead, we have added the following analyses of metabolic changes in fatty acids and glutamate using the GO-ATeam2 system: HSCs derived from GO-ATeam2 mice treated with PBS or 5-FU were used to measure changes in ATP concentrations after exposure to the fatty acid beta-oxidation (FAO) inhibitor etomoxir and the glutaminolysis inhibitor 6-diazo-5-oxo-L-norleucine (DON). Etomoxir was used at 100 µM, a concentration that inhibits FAO without inhibiting mitochondrial electron transfer complex I, as previously reported 5. DON was used at 2 mM, a concentration that sufficiently inhibits the enzyme as the Ki for glutaminase is 6 µM. In this experiment, etomoxir alone, DON alone, or etomoxir and DON in combination did not decrease the ATP concentration of HSCs in the PBS and 5-FU groups (revised Figures S7J–M), suggesting that FAO and glutaminolysis were not essential for ATP production in HSCs in the short term. Thus, according to the analysis using the GO-Ateam2 system, HSCs exposed to acute stresses change the efficiency of glucose utilization (accelerated glycolytic ATP production) rather than other energy sources. Since there are reports that FAO and glutaminolysis are required for HSC maintenance in the long term 5,6, compensatory pathways may be able to maintain ATP levels in the short term. A description of these points has been added to the Discussion (page 17, lines 525-527).

      Author response image 22.

      Fatty acid β-oxidation activity was also measured in 5-FU-treated HSCs using the fluorescent probe FAOBlue and was unchanged compared to PBS-treated HSCs (revised Figure S7N).

      Author response image 23.

      Notably, the addition of 100 µM etomoxir plus glucose and Pfkfb3 inhibitors resulted in a rapid decrease in ATP concentration in HSCs (revised Figures S7O–P). This indicates that etomoxir partially mimics the effect of oligomycin, suggesting that at a steady state, OXPHOS is driven by FAO, but can be compensated by the acceleration of the glycolytic system by Pfkfb3. Meanwhile, the exposure of HSCs to Pfkfb3 inhibitors in addition to 2 mM DON did not reduce ATP (revised Figures S7O–P). This suggests that ATP production from glutaminolysis is limited in HSCs at a steady state.

      Author response image 24.

      These points suggest that OXPHOS is driven by fatty acids at a steady state, but unlike the glycolytic system, FAO is not further activated by HSCs after 5-FU treatment. The results of these analyses and related descriptions are included in the revised manuscript (page 11, lines 332-344).

      1. In Figure S1, 5-FU leads to the induction of cycling HSCs and in figure 1, 5-FU results in higher activation of glycolysis. Would it be possible to correlate these two phenotypes together? For example, by sorting NBDG+ cells and checking the cell cycle status of these cells?

      We appreciate the reviewer’s insightful comments. We administered 2-NBDG to mice and fractionated HSCs at the 2-NBDG fluorescence level for cell cycle analysis. The results are shown below (revised Figure S1M). Notably, even in the PBS-treated group, HSCs with high 2-NBDG uptake were more proliferative than HSCs with low 2-NBDG uptake and were comparable to HSCs after 5-FU treatment, although the overall population of HSCs that exited the G0 phase and entered the G1 phase increased after 5-FU treatment. In both PBS/5-FU-treated groups, these profound differences in cell cycle glucose utilization suggest a direct link between HSC proliferation and glycolysis activation. Descriptions of the above findings and perspectives have been added to the Results and Discussion section (page 7, lines 208-214, page 20, lines 607-610).

      Author response image 25.

      1. Why are only ECAR measurements (and not OCR measurements) shown? In Fig.2G, why are HSCs compared with cKit+ myeloid progenitors, and not with MPP1? The ECAR increased observed in HSC upon oligomycin treatment is shared with many other types of cells. However, cKit+ cells have a weird behavior. Upon oligo treatment cKit+ cells decrease ECAR, which is quite unusual. The data of both HSCs and cKit+ cells could be clarified by adding OCR curves. Moreover, it is recommended to run glycolysis stress test profile to assess the dependency to glycolysis (Glucose, Oligomycin, 2DG).

      In addition to the ECAR data of the Mito Stress test (original Figures 2G–H), OCR data were added in the revised manuscript (revised Figures 2H, S3D). Compared to c-Kit+ myeloid progenitors (LKS- cells), HSC exhibited a similar increase in ECAR, while the decrease in OCR was relatively limited. This may be because glycolytic and mitochondrial metabolism are coupled in c-Kit+ myeloid progenitors, whereas they are decoupled in HSCs. This is also suggested by the glucose plus oligomycin experiment in Figures 5B, C, and S6A–D (orange lines). In summary, in HSCs, glycolytic and mitochondrial ATP production are decoupled and can maintain ATP levels by glycolytic ATP production alone, whereas in progenitors including GMPs, the two ATP production systems are constantly coupled, and glycolysis alone cannot maintain the ATP concentration. While we could not conduct a glycolysis stress test, we believe that Pfkfb3-dependent glycolytic activation, which is evident in the oligomycin+glucose+Pfkfb3i experiment, is only apparent in HSCs when subjected to glucose+oligomycin treatment (original Figures 5F–I). We have added descriptions of these points in the Results and Discussion section (page 8, lines 240-243, page 18, lines 558-561).

      Author response image 26.

      FIG.3 A-C. As mentioned previously, the flux analyses should be integrated with data using other energy sources. If cycling HSCs are less dependent to OXPHOS, what happen if you inhibit OXHPHOS in 5-FU condition? Since the authors are linking OXPHOS inhibition and upregulation of Glycolysis to increase proliferation, do HSCs proliferate more when treated with oligomycin?

      First, please see our response to comments 3 and 5 regarding the first part of this comment about the flux analysis of other energy sources. According to the analysis using the GO-Ateam2 system, stressed HSCs change the efficiency of glucose utilization (accelerated glycolytic ATP production) rather than other energy sources. The change in ATP concentration after OXPHOS inhibition for 5-FU-treated HSCs is shown in Figures 4C and E, suggesting that the activity of OXPHOS itself does not increase. HSCs after oligomycin treatment and HSCs after 5-FU treatment are similar in that they activate glycolytic ATP production. However, inhibition of OXPHOS did not induce the proliferation of HSCs (original Figure S1K). This suggests that proliferation activates glycolysis and not that activation of the glycolytic system induces proliferation. This similarity and dissimilarity of glycolytic activation upon proliferation and OXPHOS inhibition is discussed in the Discussion section (page 16-17, lines 505-515).

      1. FIG.4 shows that in vivo administration of radiolabeled glucose especially marks metabolites of TCA cycle and Glycolysis. The authors interpret enhanced anaerobic glycolysis, but I am not sure this is correct; if more glycolysis products go in the TCA cycle, it might mean that HSC start engaging mitochondrial metabolism. What do the authors think about that?

      Thank you for pointing this out. We believe that the data are due to two differences in the experimental features between in vivo (Figure S5) and in vitro (Figures 1 and S2) tracer analysis. The first difference is that in in vivo tracer analysis, unlike in vitro, all cells can metabolize U-13C6-glucose. Another difference is that after glucose labeling in vivo, it takes approximately 120–180 minutes to purify HSCs to extract metabolites, and processing on ice may result in a gradual progression of metabolic reactions within HSCs. As a result, in vivo tracer analysis may detect an increased influx of labeled carbon derived from U-13C6-glucose into the TCA cycle over an extended period. However, it is difficult to interpret whether this influx of labeled carbon is derived from the direct influx of glycolysis or the re-uptake by HSCs of metabolites that have been metabolized to other metabolites in other cells. Meanwhile, as shown in Figure 4C using the GO-ATeam2 system, ATP production from mitochondria is not upregulated by 5-FU treatment. This suggests that even if the direct influx from glycolysis into the TCA cycle is increased, the rate of ATP production does not exceed that of glycolysis. Despite these technical caveats in interpretation, the results of in vivo and in vitro tracer analyses are considered essential. In particular, we consider the increased labeling of metabolites involved in glycolysis and nucleotide synthesis to be crucial. We have added a discussion of these points, including experimental limitations (page 17-18, lines 530-545).

      1. FIG.4: the experimental design is not clear. Are BMNNCs stained and then put in culture? Is it 6-day culture or BMNNCs are purified at day 6 post 5FU? FIG-4B-C The difference between PBS vs 5FU conditions are the most significant; however, the effect of oligomycin in both conditions is the most dramatic one. From this readout, it seems that HSCs are more dependent on mitochondria for energy production both upon 5FU treatment and in PBS conditions.

      We apologize for the incomplete description of the experimental details. The experiment involved dispensing freshly stained BMMNC with surface antigens into the medium and immediately subjecting them to flow cytometry analysis. For post-5-FU treatment HSCs, mice were administered with 5-FU (day 1), and freshly obtained BMMNCs were analyzed on day 6. The analysis of HSCs and progenitors was performed by gating each fraction within the BMMNC (original Figure S5A). We have added these details to ensure that readers can grasp these aspects more clearly (page S5, lines 155-158).

      As pointed out by the reviewer, we understand that HSCs produce more ATP through OXPHOS. However, ATP production by glycolysis, although limited, is observed under steady-state conditions (post-PBS treatment HSC), and its reliance increases during the proliferation phase (post-5-FU treatment HSC) (original Figures 4B, D). Until now, discussions on energy production in HSCs have focused on either glycolysis or mitochondrial functions. However, with the GO-ATeam2 system, it has become possible for the first time to compare their contributions to ATP production and evaluate compensatory pathways. As a result, it became evident that while OXPHOS is the main source of ATP production, the reliance on glycolysis plastically increases in response to stress. This has led to a better understanding of HSC metabolism. These points are included in the Discussion as well (page 16, lines 479-488).

      1. FIG.6H should be extended with cell cycle analyses. There are no differences between 5FU and ctrl groups. If 5FU induces HSCs cycling and increases glycolysis I would expect higher 2-NBDG uptake in the 5FU group. How do the authors explain this?

      Thank you for your comments. In the original Figure 6H, we found that 2-NBDG uptake correlated with mPFKFB3 levels in both the 5-FU and PBS groups. mPfkfb3 levels remained low in the few HSCs with low 2-NBDG uptake in the 5-FU group.

      In the revised manuscript, to directly relate glucose utilization to the cell cycle, we administered 2-NBDG to mice and fractionated HSCs at the 2-NBDG fluorescence level for cell cycle analysis. The results are shown below (revised Figure S1M). Notably, even in the PBS-treated group, HSCs with high 2-NBDG uptake were more proliferative than those with low 2-NBDG uptake and are comparable to HSCs after 5-FU treatment, although the overall population of HSCs that exited the G0 phase and entered the G1 phase increased after 5-FU treatment. The large differences in glucose utilization per cell cycle observed in both PBS/5-FU-treated groups suggest a direct link between HSC proliferation and glycolysis activation. Descriptions of the above findings have been added to the Results and Discussion ((page 7, lines 208-214, page 20, lines 607-610).

      Author response image 27.

      1. In S7 the experimental design is not clear. What are quiescent vs proliferative conditions? What does it mean "cell number of HSC-derived colony"? Is it a CFU assay? Then you should show colony numbers. When HSCs proliferate, they need more energy thus inhibition of metabolism will impact proliferation. What happens if you inhibit mitochondrial metabolism with oligomycin?

      Regarding the proliferative and quiescence-maintaining conditions, we have previously reported on these 8. In brief, these are culture conditions that maintain HSC activity at a high level while allowing for the proliferation or maintenance of HSCs in quiescence, achieved by culturing under fatty acid-rich, hypoxic conditions with either high or low cytokine concentrations. Analysis was performed after one week of culture, with the HSC number determined by flow cytometry based on the LSK-SLAM marker. While these are mentioned in the Methods section, we have added a description in the main text to highlight these points for the reader (page 7, lines 214-217).

      In vitro experiments with the oligomycin treatment of HSCs showed that OXPHOS inhibition activates the glycolytic system, but does not induce HSC proliferation (original Figure S1K). This suggests that proliferation activates glycolysis and not that activation of the glycolytic system induces proliferation. This similarity and dissimilarity of glycolytic activation upon proliferation and OXPHOS inhibition is discussed in the Discussion (page 16-17, lines 505-515).

      1. In FIG 7 since homing of HSCs is influenced by the cell cycle state, should be important to show if in the genetic model for PFKFB3 in HSCs there's a difference in homing efficiency.

      In response to the reviewer's comments, we knocked out PFKFB3 in HSPCs derived from Ubc-GFP mice, transplanted 200,000 HSPCs into recipients (C57BL/6 mice) post-8.5Gy irradiation, and harvested the bone marrow of recipients after 16 h to compare homing efficiency (revised Figure S10H). Even with the knockout of PFKFB3, no significant difference in homing efficiency was detected compared to the control group (Rosa knockout group). These results suggest that the short-term reduction in chimerism due to PFKFB3 knockout is not due to decreased homing efficiency or cell death by apoptosis (Figure 7K) but a transient delay in cell cycle progression. We have added descriptions regarding these findings in the Results and Discussion sections (page 15, lines 470-471, page 19, lines 576-578).

      Author response image 28.

      1. Yamamoto M, Kim M, Imai H, Itakura Y, Ohtsuki G. Microglia-Triggered Plasticity of Intrinsic Excitability Modulates Psychomotor Behaviors in Acute Cerebellar Inflammation. Cell Rep. 2019;28(11):2923-2938 e2928.

      2. Umemoto T, Johansson A, Ahmad SAI, et al. ATP citrate lyase controls hematopoietic stem cell fate and supports bone marrow regeneration. EMBO J. 2022:e109463.

      3. Seita J, Sahoo D, Rossi DJ, et al. Gene Expression Commons: an open platform for absolute gene expression profiling. PLoS One. 2012;7(7):e40321.

      4. Boyd S, Brookfield JL, Critchlow SE, et al. Structure-Based Design of Potent and Selective Inhibitors of the Metabolic Kinase PFKFB3. J Med Chem. 2015;58(8):3611-3625.

      5. Ito K, Carracedo A, Weiss D, et al. A PML–PPAR-δ pathway for fatty acid oxidation regulates hematopoietic stem cell maintenance. Nat Med. 2012;18(9):1350-1358.

      6. Oburoglu L, Tardito S, Fritz V, et al. Glucose and glutamine metabolism regulate human hematopoietic stem cell lineage specification. Cell Stem Cell. 2014;15(2):169-184.

      7. Gnaiger E, Mendez G, Hand SC. High phosphorylation efficiency and depression of uncoupled respiration in mitochondria under hypoxia. Proc Natl Acad Sci U S A. 2000;97(20):11080-11085.

      8. Kobayashi H, Morikawa T, Okinaga A, et al. Environmental Optimization Enables Maintenance of Quiescent Hematopoietic Stem Cells Ex Vivo. Cell Rep. 2019;28(1):145-158 e149.

      9. Wang YH, Israelsen WJ, Lee D, et al. Cell-state-specific metabolic dependency in hematopoiesis and leukemogenesis. Cell. 2014;158(6):1309-1323.

      10. Jun S, Mahesula S, Mathews TP, et al. The requirement for pyruvate dehydrogenase in leukemogenesis depends on cell lineage. Cell Metab. 2021;33(9):1777-1792 e1778.

      11. Spencer JA, Ferraro F, Roussakis E, et al. Direct measurement of local oxygen concentration in the bone marrow of live animals. Nature. 2014;508(7495):269-273.

      12. Takubo K, Nagamatsu G, Kobayashi CI, et al. Regulation of glycolysis by Pdk functions as a metabolic checkpoint for cell cycle quiescence in hematopoietic stem cells. Cell Stem Cell. 2013;12(1):49-61.

      13. Anso E, Weinberg SE, Diebold LP, et al. The mitochondrial respiratory chain is essential for haematopoietic stem cell function. Nat Cell Biol. 2017;19(6):614-625.

    1. Author Response

      We would like to thank the Editors and Reviewers for their comprehensive review of the manuscript. We appreciate your feedback, and we will carefully consider all your comments in the revision of the manuscript. Below are our provisional responses to your comments.

      eLife assessment

      This manuscript reveals important insights into the role of ipsilateral descending pathways in locomotion, especially following unilateral spinal cord injury. The study provides solid evidence that this method improves the injured side's ability to support weight, and as such the findings may lead to new treatments for stroke, spinal cord injuries, or unilateral cerebral injuries. However, the methods and results need to be better detailed, and some of the statistical analysis enhanced.

      Thank you for your assessment. We will incorporate various textual enhancements in the final version of the manuscript to address the weaknesses you have pointed out. The specific improvements are outlined below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript provides potentially important new information about ipsilateral cortical impact on locomotion. A number of issues need to be addressed.

      Strengths:

      The primary appeal and contribution of this manuscript are that it provides a range of different measures of ipsilateral cortical impact on locomotion in the setting of impaired contralateral control. While the pathways and mechanisms underlying these various measures are not fully defined and their functional impacts remain uncertain, they comprise a rich body of results that can inform and guide future efforts to understand cortical control of locomotion and to develop more effective rehabilitation protocols.

      Weaknesses:

      1. The authors state that they used a cortical stimulation location that produced the largest ankle flexion response (lines 102-104). Did other stimulation locations always produce similar, but smaller responses (aside from the two rats that showed ipsilateral neuromodulation)? Was there any site-specific difference in response to stimulation location?

      We derived motor maps in each rat, akin to the representation depicted in Fig 6. In each rat, alternative cortical sites did, indeed, produce distal or proximal contralateral leg flexion responses. Distal responses were more likely to be evoked in the rostral portion of the array, similarly to proximal responses early after injury. This distribution in responses across different cortical sites is reported in this study (Fig. 6) and is consistent with our prior work. The Results section will be revised to provide additional clarification and context for the data presented in Figure 6.

      1. Figure 2: There does not appear to be a strong relationship between the percentage of spared tissue and the ladder score. For example, the animal with the mild injury (based on its ladder score) in the lower left corner of Figure 2A has less than 50% spared tissue, which is less spared tissue than in any animal other than the two severe injuries with the most tissue loss. Is it possible that the ladder test does not capture the deficits produced by this spinal cord injury? Have the authors looked for a region of the spinal cord that correlates better with the deficits that the ladder test produces? The extent of damage to the region at the base of the dorsal column containing the corticospinal tract would be an appropriate target area to quantify and compare with functional measures.

      In Fig. S6 of our 2021 publication "Bonizzato and Martinez, Science Translational Medicine", we investigated the predictive value of tissue sparing in specific sub-regions of the spinal cord for ladder performance. Specifically, we examined the correlation between the accuracy of left leg ladder performance in the acute state and the preservation of the corticospinal tract (CST). Our results indicated that dorsal CST sparing serves as a mild predictor for ladder deficits, confirming the results obtain in this study.

      1. Lines 219-221: The authors state that "phase-coherent stimulation reinstated the function of this muscle, leading to increased burst duration (90{plus minus}18% of the deficit, p=0.004, t-test, Fig. 4B) and total activation (56{plus minus}13% of the deficit, p=0.014, t-test, Fig. 3B). This way of expressing the data is unclear. For example, the previous sentence states that after SCI, burst duration decreased by 72%. Does this mean that the burst duration after stimulation was 90% higher than the -72% level seen with SCI alone, i.e., 90% + -72% = +18%? Or does it mean that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI, i.e., -72% * (100%-90%)= -7%? The data in Figure 4 suggests the latter. It would be clearer to express both these SCI alone and SCI plus stimulation results in the text as a percent of the pre-SCI results, as done in Figure 4.

      Your assessment is correct; we intended to report that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI. This point will be addressed in the revision of the manuscript.

      1. Lines 227-229: The authors claim that the phase-dependent stimulation effects in SCI rats are immediate, but they don't say how long it takes for these effects to be expressed. Are these effects evident in the response to the first stimulus train, or does it take seconds or minutes for the effects to be expressed? After the initial expression of these effects, are there any gradual changes in the responses over time, e.g., habituation or potentiation?

      The effects are immediately expressed at the very first occurrence of stimulation. We never tested a rat completely naïve to stimuli, as each treadmill session involves prior cortical mapping to identify a suitable active site for involvement in locomotor experiments. Yet, as demonstrated in Supplementary Video 1 accompanying our 2021 publication on contralateral effects of cortical stimulation, "Bonizzato and Martinez, Science Translational Medicine," the impact of phase-dependent cortical stimulation on movement modulation is instantaneous and ceases promptly upon discontinuation of the stimulation. We did not quantify potential gradual changes in responsiveness over time, but we cannot exclude that for long stimulation sessions (e.g., 30 min or more), stimulus amplitude may need to be slightly increased over time to compensate habituation.

      1. Awake motor maps (lines 250-277): The analysis of the motor maps appears to be based on measurements of the percentage of channels in which a response can be detected. This analytic approach seems incomplete in that it only assesses the spatial aspect of the cortical drive to the musculature. One channel could have a just-above-threshold response, while another could have a large response; in either case, the two channels would be treated as the same positive result. An additional analysis that takes response intensity into account would add further insight into the data, and might even correlate with the measures of functional recovery. Also, a single stimulation intensity was used; the results may have been different at different stimulus intensities.

      We confirm that maps of cortical stimulation responsiveness may vary at different stimulus amplitudes. To establish an objective metric of excitability, we identified 100µA as a reliable stimulation amplitude across rats and used this value to build the ipsilateral motor representation results in Figure 6. This choice allows direct comparison with Figure 6 of our 2021 article, related to contralateral motor representation. The comparison reveals a lack of correlation with functional recovery metrics in the ipsilateral case, in contrast to the successful correlation achieved in the contralateral case.

      Regarding the incorporation of stimulation amplitudes into the analysis, as detailed in the Method section (lines 770-771), we systematically tested various stimulation amplitudes to determine the minimal threshold required for eliciting a muscle twitch, identified as the threshold value. This process was conducted for each electrode site. Upon reviewing these data, we considered the possibility of presenting an additional assessment of ipsilateral cortical motor representation based on stimulation thresholds. However, the representation depicted in the figure did not differ significantly from the data presented in Figure 6A. Furthermore, this representation introduced an additional weakness, as it was unclear how to represent the absence of a response in the threshold scale. We chose to arbitrarily designate it as zero on the inverse logarithmic scale, where, for reference, 100 µA is positioned at 0.2 and 50 µA at 0.5.

      In conclusion, we believe that the conclusions drawn from this analysis align substantially with those in the text. The addition of the threshold analysis, in our assessment, would not contribute significantly to improving the manuscript.

      Author response image 1.

      Threshold analysis

      Author response image 2.

      Original occurrence probability analysis, for comparison.

      1. Lines 858-860: The authors state that "All tests were one-sided because all hypotheses were strictly defined in the direction of motor improvement." By using the one-sided test, the authors are using a lower standard for assessing statistical significance that the overwhelming majority of studies in this field use. More importantly, ipsilateral stimulation of particular kinds or particular sites might conceivably impair function, and that is ignored if the analysis is confined to detecting improvement. Thus, a two-sided analysis or comparable method should be used. This appropriate change would not greatly modify the authors' current conclusions about improvements.

      Our original hypothesis, drawn from previous studies involving cortical stimulation in rats and cats, as well as other neurostimulation research for movement restoration, posited a favorable impact of neurostimulation on movement. Consistent with this hypothesis, we designed our experiments with a focus on enhancing movement, emphasizing a strict direction of improvement.

      It's important to note that a one-sided test is the appropriate match for a one-sided hypothesis, and it is not a lower standard in statistics. Each experiment we conducted was constructed around a strictly one-sided hypothesis: the inclusion of an extensor-inducing stimulus would enhance extension, and the inclusion of a flexion-inducing stimulus would enhance flexion. This rationale guided our choice of the appropriate statistical test.

      We acknowledge your concern regarding the potential for ipsilateral stimulation to have negative effects on locomotion, which might not be captured when designing experiments based on one-sided hypotheses. This concern is valid, and we will explicitly mention it in the statistics section. Nonetheless, even if such observations were made, they could serve as the basis for triggering an ad-hoc follow-up study.

      Reviewer #2 (Public Review):

      Summary:

      The authors' long-term goals are to understand the utility of precisely phased cortex stimulation regimes on recovery of function after spinal cord injury (SCI). In prior work, the authors explored the effects of contralesion cortex stimulation. Here, they explore ipsilesion cortex stimulation in which the corticospinal fibers that cross at the pyramidal decussation are spared. The authors explore the effects of such stimulation in intact rats and rats with a hemisection lesion at the thoracic level ipsilateral to the stimulated cortex. The appropriately phased microstimulation enhances contralateral flexion and ipsilateral extension, presumably through lumbar spinal cord crossed-extension interneuron systems. This microstimulation improves weight bearing in the ipsilesion hindlimb soon after injury, before any normal recovery of function would be seen. The contralateral homologous cortex can be lesioned in intact rats without impacting the microstimulation effect on flexion and extension during gait. In two rats ipsilateral flexion responses are noted, but these are not clearly demonstrated to be independent of the contralateral homologous cortex remaining intact.

      Strengths:

      This paper adds to prior data on cortical microstimulation by the laboratory in interesting ways. First, the strong effects of the spared crossed fibers from the ipsi-lesional cortex in parts of the ipsi-lesion leg's step cycle and weight support function are solidly demonstrated. This raises the interesting possibility that stimulating the contra-lesion cortex as reported previously may execute some of its effects through callosal coordination with the ipsi-lesion cortex tested here. This is not fully discussed by the authors but may represent a significant aspect of these data. The authors demonstrate solidly that ablation of the contra-lesional cortex does not impede the effects reported here. I believe this has not been shown for the contra-lesional cortex microstimulation effects reported earlier, but I may be wrong. Effects and neuroprosthetic control of these effects are explored well in the ipsi-lesion cortex tests here.

      In the revised version of the manuscript, we will incorporate various text improvements to address the points you have highlighted below. Additionally, we will integrate the suggested discussion topic on callosal coordination related to contralateral cortical stimulation.

      Weaknesses:

      Some data is based on very few rats. For example (N=2) for ipsilateral flexion effects of microstimulation. N=3 for homologous cortex ablation, and only ipsi extension is tested it seems. There is no explicit demonstration that the ipsilateral flexion effects in only 2 rats reported can survive the contra-lateral cortex ablation. We agree with this assessment. The ipsilateral flexion representation is here reported as a rare but consistent phenomenon, which we believe to have robustly described with Figure 7 experiments. We will underline in the text that the ablation experiment did not conclude on the unilateral-cortical nature of ipsilateral flexion effects.

      Some improvements in clarity and precision of descriptions are needed, as well as fuller definitions of terms and algorithms.

      Likely Impacts: This data adds in significant ways to prior work by the authors, and an understanding of how phased stimulation in cortical neuroprosthetics may aid in recovery of function after SCI, especially if a few ambiguities in writing and interpretation are fully resolved.

      The manuscript text will be revised in its final version, and we seek to eliminate any ambiguity in writing, data interpretation and algorithms.

      Reviewer #3 (Public Review):

      Summary:

      This article aims to investigate the impact of neuroprosthesis (intracortical microstimulation) implanted unilaterally on the lesion side in the context of locomotor recovery following unilateral thoracic spinal cord injury.

      Strength:

      The study reveals that stimulating the left motor cortex, on the same side as the lesion, not only activates the expected right (contralateral) muscle activity but also influences unexpected muscle activity on the left (ipsilateral) side. These muscle activities resulted in a substantial enhancement in lift during the swing phase of the contralateral limb and improved trunk-limb support for the ipsilateral limb. They used different experimental and stimulation conditions to show the ipsilateral limb control evoked by the stimulation. This outcome holds significance, shedding light on the engagement of the "contralateral projecting" corticospinal tract in activating not only the contralateral but also the ipsilateral spinal network.

      The experimental design and findings align with the investigation of the stimulation effect of contralateral projecting corticospinal tracts. They carefully examined the recovery of ipsilateral limb control with motor maps. They also tested the effective sites of cortical stimulation. The study successfully demonstrates the impact of electrical stimulation on the contralateral projecting neurons on ipsilateral limb control during locomotion, as well as identifying important stimulation spots for such an effect. These results contribute to our understanding of how these neurons influence bilateral spinal circuitry. The study's findings contribute valuable insights to the broader neuroscience and rehabilitation communities.

      Thank you for your assessment of this manuscript. The final version of the manuscript will incorporate your suggestions for improving term clarity and will also enhance the discussion on the mechanism of spinal network engagement, as outlined below.

      Weakness:

      The term "ipsilateral" lacks a clear definition in the title, abstract, introduction, and discussion, potentially causing confusion for the reader. In the next revision of the manuscript, we will provide a clear definition of the term "ipsilateral."

      The unexpected ipsilateral (left) muscle activity is most likely due to the left corticospinal neurons recruiting not only the right spinal network but also the left spinal network. This is probably due to the joint efforts of the neuroprosthesis and activation of spinal motor networks which work bilaterally at the spinal level. However, in my opinion, readers can easily link the ipsilateral cortical network to the ipsilateral-projecting corticospinal tract, which is less likely to play a role in ipsilateral limb control in this study since this tract is disrupted by the thoracic spinal injury.

      We agree with your assessment. The discussion section paragraph presenting putative mechanisms of cortico-spinal transmission in the effects presented in the results will be enhanced to reflect these suggestions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper reports valuable results regarding the potential role and time course of the prefrontal cortex in conscious perception. Although the sample size is small, the results are clear and convincing, and strengths include the use of several complementary analysis methods. The behavioral test includes subject report so the results do not allow for distinguishing between theories of consciousness; nevertheless, results do advance our understanding of the contribution of prefrontal cortex to conscious perception. We appreciate very much for editor and reviewers encouraged review opinion. Particularly, we thank three reviewers very much for their professional and constructive comments that help us to improve the manuscript substantially.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear and rigorous study of intracranial EEG signals in the prefrontal cortex during a visual awareness task. The results are convincing and worthwhile, and strengths include the use of several complementary analysis methods and clear results. The only methodological weakness is the relatively small sample size of only 6 participants compared to other studies in the field. Interpretation weaknesses that can easily be addressed are claims that their task removes the confound of report (it does not), and claims of primacy in showing early prefrontal cortical involvement in visual perception using intracranial EEG (several studies already have shown this). Also the shorter reaction times for perceived vs not perceived stimuli (confident vs not confident responses) has been described many times previously and is not a new result.

      We appreciate very much for the reviewer’s encouraged opinion. We are going to address reviewer’s specific questions and comments point-by-point in following.

      ‘The only methodological weakness is the relatively small sample size of only 6 participants compared to other studies in the field.’

      We agree that the sample size is relatively small in the present study. To compensate such shortcoming, we rigorously verified each result at both individual and population levels, resembling the data analysis method in non-human primate study.

      Interpretation weaknesses that can easily be addressed are claims that their task removes the confound of report (it does not),

      Thank you very much for your comment. We agree that our task does not remove the confound of report entirely. However, we believe that our task minimizes the motor confounds by dissociating the emergence of awareness from motor in time and balanced direction of motor between aware and unaware conditions. We have modified the text according to reviewer’s comment in the revised manuscript as following: “This task removes the confound of motor-related activity”.

      ..and claims of primacy in showing early prefrontal cortical involvement in visual perception using intracranial EEG (several studies already have shown this).

      We agree that several iEEG studies, including ERP and HFA, have shown the early involvement of prefrontal cortical in visual perception. However, in these studies, the differential activity between conscious and unconscious conditions was not investigated, thus, the activity in prefrontal cortex might be correlated with unconscious processing, rather than conscious processing. In present study, we compared the neural activity in PFC between conscious and unconscious trials, and found the correlation between PFC activity and conscious perception. Although one iEEG study(Gaillard et al., 2009) reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early awareness related activity in our study. Also, due to the limited number of electrodes in the previous study (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), it was restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered multiple areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV, which sheds new light on understanding of the role of PFC in conscious perception.

      We have added this discussion in the MS (lines 522-536);

      Also the shorter reaction times for perceived vs not perceived stimuli (confident vs not confident responses) has been described many times previously and is not a new result. Thank you very much for your comment. We agree that the reaction time is strongly modulated by the confident level, which has been described previously (Broggin, Savazzi, & Marzi, 2012; Marzi, Mancini, Metitieri, & Savazzi, 2006). However, in previous studies, the confident levels were usually induced by presenting stimulus with different physical property, such as spatial frequency, eccentricity and contrast. It is well known that the more salient stimuli will induce the faster process of visual information and speed up the process of visuomotor transformation, eventually shorten the reaction time (Corbetta & Shulman, 2002; Posner & Petersen, 1990). Therefore, the dependence of visual processing on the salience of visual stimulus confounds with the effect of visual awareness on the reaction time, which is hard to attribute the shorter reaction time in more salient condition purely to visual awareness. In contrast, we create a condition (near perceptual threshold) in the present study, in which the saliency (contrast) of visual stimulus is very similar in both aware and unaware conditions in order to eliminate the influence of stimulus saliency in reaction time. We think that the difference in reaction time in our study is mainly due to the modulation of awareness state, which was not reported previously.

      We have added the discussion in the MS (lines 497-507).

      Reviewer #1 (Recommendations For The Authors):

      Specific comments follow:

      Abstract: "we designed a visual awareness task that can minimize report-related confounding" and in the Introduction lines 112-115: "Such a paradigm can effectively dissociate awareness-related activity from report-related activity in terms of time... and report behavior"; Discussion lines 481-483 "even after eliminating the influence of the confounding variables related to subjective reports such as motion preparation" and other similar statements in the manuscript should be removed. The task involves report using eye movements with every single stimulus. The fact that there is report for both perceived and not perceived stimuli, that the direction of report is not determined until the time of report, and that there is delay between stimulus and report, does not remove the report-related post-perceptual processing that will inevitably occur in a task where overt report is required for every single trial. For example, brain activity related to planning to report perception will only occur after perceived trials, regardless of the direction of eye movement later decided upon. This preparation to respond is different for perceived and not perceived stimuli, but is not part of the perception itself. In this way the current task is not at all unique and does not substantially differ from many other report-based tasks used previously.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness. To do so, it is crucial to determine the subjective awareness state as correct as possible. Considering the disadvantage of non-report paradigms in determining the subjective awareness state (Tsuchiya et al. TiCS, 2015; Mashour et al, Neuron, 2020), we employed a balanced report paradigm. It has been argued (Merten & Nieder, PNAS, 2011) that, in the balanced report paradigms, subjects could not prepare any motor response during the delay period because only the appearance of a rule cue (change color of fixation point at the end of delay period) informed subjects about the appropriate motor action. In this case, the post-perceptual processing during delay period might reflect the non-motor cognitive activity. Alternatively, as being mentioned by reviewer, the post-perceptual processing might relate to planning to report perception, which is different for perceived and not perceived stimuli. Therefore, up to date, the understanding of the post-perceptual processing remains controversial. According to reviewer’s comment, we have modified the description of our task as following: “we designed a visual awareness task that can minimize report-related motor confounding”. Also, have changed “report-related” to “motorrelated” in the text of manuscript.

      Figures 3, 4 changes in posterior middle frontal gyri suggest early frontal eye field involvement in perception. This should be interpreted in the context of many previous studies showing FEF involvement in signal detection. The authors claim that "earlier visual awareness related activities in the prefrontal cortex were not found in previous iEEG studies, especially in the HG band" on lines 501-502 of the Discussion. This statement is not true and should be removed. The following statement in the Discussion on lines 563-564 should be removed for the same reasons: "our study detected 'ignition' in the human PFC for the first time." Authors should review and cite the following studies as precedent among others:

      Blanke O, Morand S, Thut G, Michel CM, Spinelli L, Landis T, Seeck M (1999) Visual activity in the human frontal eye field. Neuroreport 10 (5):925-930. doi:10.1097/00001756-19990406000006

      Foxe JJ, Simpson GV (2002) Flow of activation from V1 to frontal cortex in humans. A framework for defining "early" visual processing. Exp Brain Res 142 (1):139-150. doi:10.1007/s00221-001-0906-7

      Gaillard R, Dehaene S, Adam C, Clemenceau S, Hasboun D, Baulac M, Cohen L, Naccache L (2009) Converging intracranial markers of conscious access. Plos Biology 7 (3):e61

      Gregoriou GG, Gotts SJ, Zhou H, Desimone R (2009) High-frequency, long-range coupling between prefrontal and visual cortex during attention. Science 324:1207-1210

      Herman WX, Smith RE, Kronemer SI, Watsky RE, Chen WC, Gober LM, Touloumes GJ, Khosla M, Raja A, Horien CL, Morse EC, Botta KL, Hirsch LJ, Alkawadri R, Gerrard JL, Spencer DD, Blumenfeld H (2019) A Switch and Wave of Neuronal Activity in the Cerebral Cortex During the First Second of Conscious Perception. Cereb Cortex 29 (2):461-474.

      Khalaf A, Kronemer SI, Christison-Lagay K, Kwon H, Li J, Wu K, Blumenfeld H (2022) Early neural activity changes associated with stimulus detection during visual conscious perception. Cereb Cortex. doi:10.1093/cercor/bhac140

      Kwon H, Kronemer SI, Christison-Lagay KL, Khalaf A, Li J, Ding JZ, Freedman NC, Blumenfeld H (2021) Early cortical signals in visual stimulus detection. Neuroimage 244:118608.

      We agree that several iEEG studies, including ERP and HFA, have shown the early involvement of prefrontal cortical in visual perception. However, in these studies, the differential activity between conscious and unconscious conditions was not investigated, thus, the activity in prefrontal cortex might be correlated with unconscious processing, rather than conscious processing. In present study, we compared the neural activity in PFC between conscious and unconscious trials, and found the correlation between PFC activity and conscious perception. Although one iEEG study reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early awareness related activity in our study. Also, due to the limited number of electrodes in the previous study (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), it was restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered multiple areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV, which sheds new light on understanding of the role of PFC in conscious perception.

      We have added this discussion in the MS (lines 522-533);

      Minor weakness that should be mentioned in the Discussion: The intervals for the FP (fixation period) and Delay period were both fixed at 600 ms instead of randomly jittered, so that subjects likely had anticipatory activity predictably occurring with each grating and cue stimulus.

      Thank you very much for your comment. We agree that subjects might have anticipatory activity during experiment. Actually, the goal for us to design the task in this way is to try to balance the effect of attention and anticipation between aware and unaware conditions. We have added this discussion in the MS (lines 467-469);

      The faster reaction times for perceived/confident responses vs not perceived/unconfident responses has been reported many times previously in the literature and should be acknowledged rather than being claimed as a novel finding. Authors should modify p. 163 lines 160-162, first sentence of the Discussion lines 445-446 "reaction time.. shorter" claiming this was a novel finding; same for lines 464-467. Please see the following among others:

      Broggin E, Savazzi S, Marzi CA (2012) Similar effects of visual perception and imagery on simple reaction time. Q J Exp Psychol (Hove) 65 (1):151-164. doi:10.1080/17470218.2011.594896

      Chelazzi L, Marzi CA, Panozzo G, Pasqualini N, Tassinari G, Tomazzoli L (1988) Hemiretinal differences in speed of light detection in esotropic amblyopes. Vision Res 28 (1):95-104 Marzi CA, Mancini F, Metitieri T, Savazzi S (2006) Retinal eccentricity effects on reaction time to imagined stimuli. Neuropsychologia 44 (8):1489-1495. doi:10.1016/j.neuropsychologia.2005.11.012

      Posner MI (1994) Attention: the mechanisms of consciousness. Proceedings of the National Academy of Sciences of the United States of America 91 (16):7398-7403

      Sternberg S (1969) Memory-scanning: mental processes revealed by reaction-time experiments. Am Sci 57 (4):421-457

      Thanks. We have cited some of these papers in the revised manuscript due to the restricted number of citations.

      Methods lines 658-659: "results under LU and HA conditions were classified as the control group and were only used to verify and check the results during calculation." However the authors show these results in the figures and they are interesting. HA stimuli show earlier responses than NA stimuli. This is a valuable result which should be discussed and interpreted in light of the other findings.

      We thank very much for reviewer’s comment. We have made discussion accordingly in the revised MS (lines 535-536).

      General comment on figures: Many of the figure elements are tiny and the text labels and details can't be seen at all, especially single trial color plots, and the brain insets showing recording sites.

      We have modified the figures accordingly.

      Other minor comments: Typo: Figure 2 legend, line 169 "The contrast level resulted in an awareness percentage greater than 25%..." is missing a word and should say instead something like "The contrast level that resulted in an awareness percentage greater than 25%..."

      Thanks. We have corrected the typo accordingly.

      Figure 2 Table description in text line 190 says "proportions of recording sites" but the Table only shows number of recording sites and number of subjects, not "proportions." This should be corrected in the text.

      Thanks. We have corrected the error.

      Figure 3, and other figures, should always label the left and right hemispheres to avoid ambiguity.

      Thanks. We have made correction accordingly. In caption of Figure 2D (line 189), we modified the sentence as ‘In all brain images, right side of the image represents the right side of the brain’.

      Methods line 666. The saccadic latency calculations paragraph should have a separate heading before it, to separate it from the Behavioral data analysis section.

      Thanks. It has been corrected in line 725.

      Reviewer #2 (Public Review):

      The authors attempt to address a long-standing controversy in the study of the neural correlates of visual awareness, namely whether neurons in prefrontal cortex are necessarily involved in conscious perception. Several leading theories of consciousness propose a necessary role for (at least some sub-regions of) PFC in basic perceptual awareness (e.g., global neuronal workspace theory, higher order theories), while several other leading theories posit that much of the previously reported PFC contributions to perceptual awareness may have been confounded by task-based cognition that co-varied between the aware and unaware reports (e.g., recurrent processing theory, integrated information theory). By employing intracranial EEG in human patients and a threshold detection task on low-contrast visual stimuli, the authors assessed the timing and location of neural populations in PFC that are differentially activated by stimuli that are consciously perceived vs. not perceived. Overall, the reported results support the view that certain regions of PFC do contribute to visual awareness, but at time-points earlier than traditionally predicted by GNWT and HOTs.

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      Major strengths of this paper include the straightforward visual threshold detection task including the careful calibration of the stimuli and the separate set of healthy control subjects used for validation of the behavioral and eye tracking results, the high quality of the neural data in six epilepsy patients, the clear patterns of differential high gamma activity and temporal generalization of decoding for seen versus unseen stimuli, and the authors' interpretation of these results within the larger research literature on this topic. This study appears to have been carefully conducted, the data were analyzed appropriately, and the overall conclusions seem warranted given the main patterns of results.

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      Weaknesses include the saccadic reaction time results and the potential flaws in the design of the reporting task. This is not a "no report" paradigm, rather, it's a paradigm aimed at balancing the post-perceptual cognitive and motor requirements between the seen and unseen trials. On each trial, subjects/patients either perceived the stimulus or not, and had to briefly maintain this "yes/no" judgment until a fixation cross changed color, and the color change indicated how to respond (saccade to the left or right). Differences in saccadic RTs (measured from the time of the fixation color change to moving the eyes to the left or right response square) were evident between the seen and unseen trials (faster for seen). If the authors' design achieved what they claim on page 3, "the report behaviors were matched between the two awareness states ", then shouldn't we expect no differences in saccadic RTs between the aware and unaware conditions? The fact that there were such differences may indicate differences in post-perceptual cognition during the time between the stimulus and the response cue. Alternatively, the RT difference could reflect task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory). This saccadic RT result should be better explained in the context of the goals of this particular reporting-task.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness. To do so, it is crucial to determine the subjective awareness state as correct as possible. Considering the disadvantage of non-report paradigms in determining the subjective awareness state (Tsuchiya et al, TiCS, 2015; Mashour et al, Neuron, 2020), we employed a balanced report paradigm. It has been argued (Merten & Nieder, PNAS, 2011) that, in the balanced report paradigms, subjects could not prepare any motor response during the delay period because only after the appearance of a rule cue (change color of fixation point at the end of delay period) subjects were informed about the appropriate motor action. In this case, the post-perceptual processing during delay period might reflect the non-motor cognitive activity, such as working memory (Mashour et al. Neuron, 2020). Alternatively, as being mentioned by reviewer, the postperceptual processing might relate to planning to report perception, which is different for perceived and not perceived stimuli (Aru et al. Neurosci Biobehav Rev, 2012 ). Therefore, up to date, the understanding of the post-perceptual processing remains controversial. Considering reviewer’s comment together with other opinions, we have modified the description of our task as following: “we designed a visual awareness task that can minimize report-related motor confounding”. Also, we have changed “report-related” to “motor-related” in the rest of manuscript.

      Regarding the question whether the saccadic RT in our balanced response paradigm should be expected to be similar between aware and unaware condition, we think that the RT should be similar in case if the delay period is long enough for the decision of “no” to be completed. In fact, in a previous study (Merten & Nieder, PNAS, 2011), the neuronal encoding of “no” decision didn’t appear until 2s after the stimulus cue onset. However, in our task, the delay period lasted only 600 ms that was long enough to form the “yes” decision, but was not enough to form the “no” decision. It might be the reason that our data show shorter RT in aware condition than in unaware condition.

      We totally agree reviewer’s comment about the alternative interpretation for RT difference between aware and unaware condition in our study, i.e., reflecting task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory). We have made additional discussion about these questions in the revised manuscript (lines 492496).

      Nevertheless, the current results do help advance our understanding of the contribution of PFC to visual awareness. These results, when situated within the larger context of the rapidly developing literature on this topic (using "no report" paradigms), e.g., the recent studies by Vishne et al. (2023) Cell Reports and the Cogitate consortium (2023) bioRxiv, provide converging evidence that some sub-regions of PFC contribute to visual awareness, but at latencies earlier than originally predicted by proponents of, especially, global neuronal workspace theory.

      We appreciate very much for the reviewer’s encouraged opinion.

      Reviewer #2 (Recommendations For The Authors):

      Abstract: "the spatiotemporal overlap between the awareness-related activity and the interregional connectivity in PFC suggested that conscious access and phenomenal awareness may be closely coupled." I strongly suggest revising this sentence. The current results cannot be used to make such a broad claim about p-consciousness vs. a-consciousness. This study used a balanced trial-by-trial report paradigm, which can only measure conscious access.

      We thank reviewer for this comment. We have withdrawn this sentence from the revised manuscript.

      Task design: A very similar task was used previously by Schröder et al. (2021) J Neurosci. See specifically, their Figure 1, and Figure 4B-C. Using almost the exact same "matching task", the authors of this previous study show that they get a P3b for both the perceived and not-perceived conditions, confirming that post-perceptual cognition/report confounds were not eliminated, but instead were present in (and balanced between) both the perceived/not-perceived trials due to the delayed matching aspect of the design. This previous paper should be cited and the P3b result should be considered when assessing whether cognition/report confounds were addressed in the current study.

      Thank you very much for your reminding about the study of Schröder et al. We are sorry for not citing this closely related study in our previous manuscript. Schröder et al. found while P3b showed significant difference between perceived and not-perceived trials in direct report task, the P3b was presented in both perceived/not-perceived trials and not significantly different in the matched task. Based on these findings, Schröder et al. argued that P3b represented the task specific post-perceptual cognition/report rather than the emergence of awareness per se. Considering the similarity of tasks between Schröder et al. and ours, we agree that our task is not able to totally eliminate the confound of post-perceptual cognition/report related activity with awareness related activity. Nevertheless, our task is able to minimize the confound of motorrelated activity with the emergence of awareness by separating them in time and balancing the direction of responsive movements. Therefore, we modified the term of “report-related” to “motor-related” in the text of revised manuscript.

      On page 2, lines 71-75, the authors' review of the Frassle et al. (2014) experiment should be revised for accuracy. In this study, all PFC activity did not disappear as the authors claim. Also, the main contrast in the Frassle et al. study was rivalry vs. replay. However, in both of these conditions, visual awareness was changing with the main difference being whether there was sensory conflict between the two eyes or not. Such a contrast would presumably subtract out the common activity patterns related to visual awareness changes, while isolating rivalry (and the resulting neural competition) vs. non-rivalry (and the lack of such competition) which is not broadly relevant for the goal of measuring neural correlates of visual awareness which are present in both sides of the contrast (rivalry and replay).

      Thank you very much for your suggestion. We agree that and revised in the MS (lines 71-76).

      ‘For instance, a functional magnetic resonance imaging (fMRI) study employing human binocular rivalry paradigms found that when subjects need to manually report the changing of their awareness between conflict visual stimuli, the frontal, parietal, and occipital lobes all exhibited awareness-related activity. However, when report was not required, awareness-related activation was largely diminished in the frontal lobe but remained in the occipital and parietal lobes’

      On page 2, lines 76-78, the authors write, "no-report paradigm may overestimate unconscious processing because it cannot directly measure the awareness state". This should be reworded for clarity, as report paradigms also do not "directly measure the awareness state". All measures of awareness are indirect, either via subjects verbal or manual reports, or via behaviors or other physiological measures like OKN, pupillometry, etc. It's also not clear as written why no-report paradigms might overestimate unconscious processing.

      Thank you very much for your suggestion. We agreed and modified the description. In lines 76-80:

      ‘Nevertheless, the no-report paradigm may overestimate the neural correlates of awareness by including unconscious processing, because it infers the awareness state through other relevant physiological indicators, such as optokinetic nystagmus and pupil size(Tsuchiya, Wilke, Frassle, & Lamme, 2015). In the absence of subjective reports, it remains controversial regarding whether the presented stimuli are truly seen or not.’

      However, the no-report paradigm may overestimate the neural correlates of awareness, because it infers the awareness state through other relevant physiological indicators, such as optokinetic nystagmus and pupil size(Tsuchiya et al., 2015) , in the absence of subjective reports and it remains controversial that whether the stimuli presented in such paradigm are truly seen as opposed to being merely potentially visible but unattended.

      On page 5, line 155, there is a typo. This should be Figure 2C, not 2B.

      Thanks. We have modified the description.

      On page 5, lines 160-162, the authors state, "The results showed that the saccadic reaction time in the aware trials was systematically shorter than that in the unaware trials. Such results demonstrate that visual awareness significantly affects the speed of information processing in the brain." I don't understand this. If subjects can never make a saccade until the fixation cross changes color, both for Y and N decisions, why would a difference in saccadic reaction times indicate anything about visual awareness affecting the speed of information processing in the brain? Doesn't this just show that the Red/Green x Left/Right response contingencies were easier to remember and execute for the Yes-I-did-see-it decisions compared to the No-I-didn't-see-it decisions?

      We agree and have made additional discussion about these questions in the revised manuscript (lines 492-496).

      ‘An alternative interpretation for RT difference between aware and unaware condition in our study is that the difference in task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory).’

      In Figure 3B (and several other figures) due to the chosen view and particular brain visualization used, many readers will not know whether the front of brain is up and back of brain down or vise versa (there are no obvious landmarks like the cerebellum, temporal sulcus, etc.). I suggest specifying this in the caption or better yet on the figure itself.

      Thanks. We have added these descriptions in the caption of Figure 2D.

      Line 189 ‘In all brain images, right and up sides of each image represent the right and up sides of the brain’.

      In Figure 3B, the color scale may confuse some readers. When I first inspected this figure, I immediately thought the red meant positive voltage or activation, while the blue meant negative voltage or deactivation. Only later, I realized that any color here is meaningful. Not sure if an adjustment of the color scale might help, or perhaps not normalizing (and not taking absolute values of the voltage diffs, but maintaining the +/- diffs)?

      Thanks for reviewer’s comment. We are sorry for not clearly describing the reason why we normalized the activity in absolute value and chose the color scale from 0 to 20. The major reason is that it is not clearly understood so far regarding the biological characteristics of LFP polarity (Einevoll et al, Nat Rev Neurosci, 2013). To simplify such complex issue, we consider the change in magnitude of LFP during delay period in our task represents awareness related activity, regardless its actual value being positive or negative. Therefore, we first calculated the absolute value of activity difference between aware and unaware trials in individual recording site, then used Shepard's method (see Method for detailed information) to calculate the activity in each vertex and projected on the surface of brain template as shown in Fig. 3B.

      We have added the description in the MS (lines 794-800).

      We have tried to adjust the color scale from -20 to 20 according to reviewer’s suggestion. However, the topographic heatmap showed less distinguishable between brain regions with different strength of awareness related activity. Thus, we would like to keep the way as we used to analyze and present these results.

      Figure 3B: Why choose seemingly arbitrary time points in this figure? What's the significance of 247 and 314 and 381ms (why not show 200, 250, 300, etc.)? Also, are these single time-points or averages within a broader time window around this time-point, e.g., 225-275ms for the 250ms plot?

      Thank reviewer for this helpful comment. We are sorry for not clearly describing why we chose the 8 time points to demonstrate the spatiotemporal characteristics of awareness related activity in Fig. 3B. To identify the awareness related activity, we analyzed the activity difference between aware and unaware trials during delay period (180-650 ms after visual stimulus onset). The whole dynamic process has been presented in SI with a video (video S1). Here, we just sampled the activity at 8 time points (180 ms, 247 ms, 314 ms, etc.) that equally divided the 430 ms delay period.

      We have added the description in the MS (lines 213-215).

      Figure 3D: It's not clear how this figure panel is related to the data shown in Fig3A. In Fig3A, the positive amplitude diffs all end at around 400ms, but in Fig3D, these diffs extend out to 600+ms. I suggest adding clarity about the conversion being used here.

      Thanks for reviewer’s comment. We are sorry for not clearly describing the way to analyze the population activity (Fig. 3D) in the previous version of manuscript. Since it is not clearly understood so far regarding the biological characteristics of LFP polarity, to simplify such complex issue, we consider the change in magnitude of LFP during delay period in our task is awareness related activity, regardless its actual value being positive or negative. Therefore, while analyzing the awareness related population activity, we first calculate the absolute value of activity difference between aware and unaware trials in individual recording site, then pool the data of 43 recording sites together and calculate the mean and standard error of mean (SEM)(Fig. 3D). As you can see in Fig. 3A, the activity difference between aware (red) and unaware (blue) trials lasts until/after the end of delay period. Thus, the awareness related population activity in Fig 3D extends out to 600 ms.

      We have added the description in the MS (lines 769-777).

      Figure 6D could be improved by making the time labels much bigger, perhaps putting them on the time axis on the bottom rather than in tiny text above each brain.

      Thanks for reviewer’s comment. We have modified it accordingly.

      Page 18, line 480: "our results show that the prefrontal cortex still displays visual awareness-related activities even after eliminating the influence of the confounding variables related to subjective reports such as motion preparation" This is too strong of a statement. It's not at all clear whether confounding variables related to subjective reports (especially the cognition needed to hold in mind the Y/N decision about seeing the stimulus prior to the response cue) were eliminated with the design used here. In other places of the manuscript, the authors use "minimized" which is more accurate.

      Thanks for reviewer’s comment. We have modified it accordingly.

      Page 19, section starting on line 508: The authors should consider citing the study by Vishne et al. (2023), which was just accepted for publication recently, but has been posted on bioRxiv for almost a year now: https://www.biorxiv.org/content/10.1101/2022.08.02.502469v1 . And on page 20, line 563, the authors claim that to the best of their knowledge, they were the first to detect "ignition" in PFC in human subjects. Consider revising this statement, now that you know about the Vishne et al. paper.

      We agree.

      Thanks for your reminding about these papers. We have cited this study and made discussion in the revised manuscript (line 522-533). We agree that several iEEG studies have shown the early involvement of PFC in visual perception (Vishne et al. 2023; Khalaf et al. 2023; Kwon et al. 2021). However, in these studies, authors did not compare the neural activity between conscious and unconscious conditions, leaving the possibility that the ERP and HFA were correlated with the unconscious information processing rather than awareness-specific processing. In the present study, we compared the neural activity in PFC between conscious and unconscious trials, and found that the activity of PFC specifically correlated with conscious perception. As we mentioned in the previous version of manuscript, there is one iEEG study (Gaillard et al. 2009) that reported awareness-specific activity in PFC. However, the awareness related activity started more than 300 ms after the onset of visual stimuli, which was about 100 ms longer than the early awareness related activity in our study. Nevertheless, according to reviewer’s comment, we modified our argument as following in lines 621-623:

      ‘However, as discussed above, in contrast with previous studies, our study detected earlier awareness-specific ‘ignition’ in the human PFC, while minimizing the motor-related confounding.’

      Experimental task section of Methods: Were any strategies for learning the response cue matching task suggested to patients/subjects, and/or did any patients/subjects report which strategy they ended up using? For example, if I were a subject in this experiment, I would remember and mentally rehearse the rules: "YES+GREEN = RIGHT" and "YES+RED = LEFT". For trials in which I didn't see anything, I wouldn't need to hold 2 more rules in mind, as they can be inferred from the inverse of the YES rules (and it's much harder to hold 4 things in mind than 2). This extra inference needed to get to the NO+GREEN = LEFT and NO+RED = RIGHT rules would likely cause me to respond slightly slower to the NO trials compared to the YES trials, leading to saccadic RT effects in the same direction the authors found. More information about the task training and strategies used by patients/subjects would be helpful.

      We agree and discussed this in lines 492-496.

      Reviewer #3 (Public Review):

      The authors report a study in which they use intracranial recordings to dissociate subjectively aware and subjectively unaware stimuli, focusing mainly on prefrontal cortex. Although this paper reports some interesting findings (the videos are very nice and informative!) the interpretation of the data is unfortunately problematic for several reasons. I will detail my main comments below. If the authors address these comments well, I believe the paper may provide an interesting contribution to further specifying the neural mechanisms important for conscious access (in line with Gaillard et al., Plos Biology 2009).

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      The main problem with the interpretation of the data is that the authors have NOT used a so called "no-report paradigm". The idea of no report paradigms is that subjects passively view a certain stimulus without the instruction to "do something with it", e.g., detect the stimulus, immediately or later in time. Because of the confusion of this term, specifically being related to the "act of reporting", some have argued we should use the term no-cognition paradigm instead (Block, TiCS, 2019, see also Pitts et al., Phil Trans B 2018). The crucial aspect is that, in these types of paradigms, the critical stimulus should be task-irrelevant and thus not be associated with any task (immediately or later). Because in this experiment subjects were instructed to detect the gratings when cued 600 ms later in time, the stimuli are task relevant, they have to be reported about later and therefore trigger all kinds of (known and potentially unknown) cognitive processes at the moment the stimuli are detected in real-time (so stimulus-locked). You could argue that the setup of this delayed response task excludes some very specific report related processes (e.g., the preparation of an eye-movement), which is good, however this is usually not considered the main issue. For example when comparing masked versus unmasked stimuli (Gaillard et al., 2009 Plos Biology), these conditions usually also both contain responses but these response related processes are "averaged out" in the specific contrasts (unmasked > masked). In this paper, RT differences between conditions (that are present in this dataset) are taken care of by using this delayed response in this paper, which is a nice feature for that and is not the case for the above example set-up.

      Given the task instructions, and this being merely a delayed-response task, it is to be expected that prefrontal cortex shows stronger activity for subjectively aware versus subjectively unaware stimuli. Unfortunately, given the nature of this task, the novelty of the findings is severely reduced. The authors cannot claim that prefrontal cortex is associated with "visual awareness", or what people have called phenomenal consciousness (this is the goal of using no-cognition paradigms). The only conclusion that can be drawn is that prefrontal cortex activity is associated with accessing sensory input: and hence conscious access. This less novel observation has been shown many times before and there is also little disagreement about this issue between different theories of consciousness (e.g., global workspace theory and local recurrency theories both agree on this).

      We totally agree that the no-report/no-cognition paradigms contain less cognition within the post-perceptual processing than the report paradigms. We designed the balanced response task in order to minimize the motor related component from post-perceptual processing, even though this task does not eliminate the entire cognition from post-perceptual processing. Regarding reviewer’s comment that our task is not able to assess the involvement of PFC in the emergence of awareness, we have different opinion. As we mentioned in the manuscript, the findings of early awareness related activity (~200 ms) in PFC, which resemble the VAN activity in EEG studies, indicate the association of PFC with the emergence of visual awareness (phenomenal consciousness).

      The best solution at this point seems to rewrite the paper entirely in light of this. My advice would be to state in the introduction that the authors investigate conscious access using iEEG and then not refer too much to no-cognition paradigm or maybe highlight some different strategies about using task-irrelevant stimuli (see Canales-Johnson et al., Plos Biology 2023; Hesse et al., eLife 2020; Hatamimajoumerd et al Curr Bio 2022; Alilovic et al., Plos Biology 2023; Pitts et al., Frontiers 2014; Dwarakanth et al., Neuron 2023 and more). Obviously, the authors should then also not claim that their results solve debates about theories regarding visual awareness (in the "no-cognition" sense, or phenomenal consciousness), for example in relation to the debate about the "front or the back of the brain", because the data do not inform that discussion. Basically, the authors can just discuss their results in detail (related to timing, frequency, synchronization etc) and relate the different signatures that they have observed to conscious access.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness (i.e., phenomenal consciousness). Interestingly, we found the early awareness related activity (~200 ms after visual stimulus onset), including ERP, high gamma activity and phase synchronization, in PFC, which indicate the association of PFC with the emergence of visual awareness. Therefore, we would like to keep the basic context of manuscript and make revision according to reviewers’ comments.

      On the other hand, we totally agree reviewer’s argument that the report paradigm is more suitable to study the access consciousness. Indeed, we have found that the awareness related activity in PFC could be separated into two subgroups, i.e., early activity with shorter latency (~200 ms after stimulus onset) and late activity with longer latency (> 350 ms after stimulus onset). In addition, the early activity was declined to the baseline level within ~200 ms during delay period, whereas the late activity lasted throughout the delay period and reached to the next stage of task (change color of the fixation point). Moreover, the early activity occurs primarily within the contralateral PFC of the visual stimulus, whereas the late activity occurs within both contralateral and ipsilateral PFC. While the early awareness related activity resembles the VAN activity in EEG studies (associating with p-consciousness), the late awareness related activity resembles the P3b activity (associating with a-consciousness). We are going to report these results in a separated paper soon.

      I think the authors have to discuss the Gaillard et al PLOS Biology 2009 paper in much more detail. Gaillard et al also report a study related to conscious access contrasting unmasked and masked stimuli using iEEG. In this paper they also report ERP, time frequency and phase synchronization results (and even Granger causality). Because of the similarities in approach, I think it would be important to directly compare the results presented in that paper with results presented here and highlight the commonalities and discrepancies in the Discussion.

      Thanks for reviewer’s comment. We have made additional analysis and detailed discussion accordingly. In addition, we also extended discussion with other relevant studies in the revised manuscript.

      In lines 528-549,

      ‘Although one iEEG study reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early activity in our study. Also, due to the limited number of electrodes in PFC (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), their experiments were restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered more areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV. These awareness-related activity in PFC occurred even earlier (~150 ms after stimulus onset) for the salient stimulus trials (Fig. 3A\D and Fig. 4A\D, HA condition).

      However, the proportions are much smaller than that reported by Gaillard et al, which peaked at ~60%. We think that one possibility for the difference may be due to the more sampled PFC subregions in present study and the uneven distribution of awareness-related activity in PFC. Meanwhile, we noticed that the peri-insula regions and middle frontal gyrus (MFG), which were similar with the regions reported by Gaillard et al, seemed to show more fraction of awarenessrelated sites than other subregions during the delay period (0-650 ms after stimulus onset). To test such possibility and make comparison with the study of Gaillard et al. we calculated the proportion of awareness-related site in peri-insula and MFG regions. We found although the proportion of awareness-related site was larger in peri-insula and MFG than in other subregions, it was much lower than the report of Gaillard et al. One alternative possibility for the difference between these two studies might be due to the more complex task in Gaillard et al. Nevertheless, we think these new results would contribute to our understanding of the neural mechanism underlying conscious perception, especially for the role of PFC.’ In lines 601-603:

      ‘The only human iEEG study reported that the phase synchronization of the beta band in the aware condition also occurred relatively late (> 300 ms) and mainly confined to posterior zones but not PFC.’

      As for the Granger Causality analysis between PFC and occipital lobe, while the aim of this study focused mainly on PFC and there were few recoding sites in occipital lobe, we would like to do this analysis in later studies after we collect more data.

      In the Gaillard paper they report a figure plotting the percentage of significant frontal electrodes across time (figure 4A) in which it can be seen that significant electrodes emerge after approximately 250 ms in PFC as well. It would be great if the authors could make a similar figure to compare results. In the current paper there are much more frontal electrode contacts than in the Gaillard paper, so that is interesting in itself.

      Thanks reviewer for this constructive comment. We made similar analysis as Gaillard et al. and plotted the results in the figure bellow. As you can see, the awareness related sites started to emerge about 200 ms after visual stimulus onset according to both ERP and HG activity. The proportion of awareness related sites reached peak at ~14% (8% for HG) in 300-400ms. However, the proportions are much smaller than that reported by Gaillard et al, which peaked at ~60%. We think that one possibility for the difference may be due to the more sampled PFC subregions in present study and the uneven distribution of awareness-related activity in PFC. Meanwhile, we noticed that the peri-insula regions and middle frontal gyrus (MFG), which were similar with the regions reported by Gaillard et al, seemed to show more fraction of awareness-related sites than other subregions during the delay period (0-650 ms after stimulus onset). To test such possibility and make comparison with the study of Gaillard et al. we calculated the proportion of awareness-related site in peri-insula and MFG regions. We found although the proportion of awareness-related site was larger in peri-insula and MFG than in other subregions, it was much lower than the report of Gaillard et al. One alternative possibility for the difference between these two studies might be due to the more complex task in Gaillard et al.

      We have added this figure and discussion to the revised manuscript as a new result (Figure 4E & S2 and lines 537-549).

      Author response image 1.

      Percentage of awareness-related sites in ERP and HG analysis. n, number of recording sites in PFC.

      Author response image 2.

      Percentage of awareness-related sites in ERP and HG analysis at parsopercularis and middle frontal gyrus (MFG). n, number of recording sites.

      In my opinion, some of the most interesting results are not highlighted: the findings that subjectively unaware stimuli show increased activations in the prefrontal cortex as compared to stimulus absent trials (e.g., Figure 4D). Previous work has shown PFC activations to masked stimuli (e.g., van Gaal et al., J Neuroscience 2008, 2010; Lau and Passigngham J Neurosci 2007) as well as PFC activations to subjectively unaware stimuli (e.g., King, Pescetelli, and Dehaene, Neuron 2016) and this is a very nice illustration of that with methods having more detailed spatial precision. Although potentially interesting, I wonder about the objective detection performance of the stimuli in this task. So please report objective detection performance for the patients and the healthy subjects, using signal detection theoretic d'. This gives the reader an idea of how good subjects were in detecting the presence/absence of the gratings. Likely, this reveals far above chance detection performance and in that case I would interpret these findings as "PFC activation to stimuli indicated as subjectively unaware" and not unconscious stimuli. See Stein et al., Plos Biology 2021 for a direct comparison of subjectively and objectively unaware stimuli.

      We gratefully appreciate for reviewer’s helpful and valuable comments. We do notice that the activity of PFC in subjectively unawareness condition (stimulus contrast near perceptual threshold) is significantly higher than stimulus absent condition. Such results, by using sEEG recordings with much higher spatial resolution than brain imaging and scalp EEG, support findings of previous studies (citations). Considering the question of neural correlation of unawareness processing is a hot and interesting topic, after carefully considering, we would like to report these results in a separate paper, rather than add these results in the current manuscript in order to avoid the distraction.

      According to reviewer’s comment about the objective detection performance of the stimuli in our task, we analyzed the signal detection theoretic d’. The values of d’ in patients and healthy subjects are similar (1.81±0.27 in patients and 2.12±0.37 in healthy subjects). Such results indicate that the objective detection performance of subjects in our task is well above the chance level. Since our task merely measures the subjective awareness, we agree reviewer’s comment about the interpretation of our results as “PFC activation to stimuli indicated the subjective unawareness rather than objective unawareness”. We will emphasize this point in our next paper.

      We have added the d prime in the MS (lines149-150).

      In Figure 7 of the paper the authors want to make the case that the contrast does not differ between subjectively aware stimuli and subjectively unaware stimuli. However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo. Because several P values are very close to significance I anticipate that a test across subjects will clearly show that the contrast level of the subjectively aware stimuli is higher than of the subjectively unaware stimuli, at the group level. A solution to this would be to sub-select trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions.

      Thank reviewer for the helpful comment. Regarding reviewer’s comment “However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo”, if we understand correctly, reviewer considered that it was fair if the analysis of neural activity in PFC was done across subjects but the stimulus contrast analysis between NA and NU was done individually. Actually, it is not the case. In neural activity analysis, the significant awareness-related sites were identified firstly in each individual subject (Fig. 3A and Fig 4A, and Methods), same as the analysis of stimulus contrast (see Methods). Only in the neural population activity analysis, the activity of awareness-related sites was pooled together and made further analysis.

      To further evidence the awareness related activity in PFC is not highly correlated with stimulus contrast, we compared the activity difference between two different stimulus contrast conditions, i.e., stimulus contrast difference between high-contrast aware (HA) and NA conditions (large difference, ~14%), and between NA and NU conditions (slight difference, ~0.2%). The working hypothesis is that, if PFC activity is closely correlated with the contrast of stimulus contrast, we expect to see the activity difference between HA and NA conditions is much larger than that between NA and NU conditions. To test this hypothesis, we analyzed data of two patients in which the previous analysis showed significant or near significant difference of stimulus contrast between NA and NU conditions (Author response image 1, below, patient #2 and 1). The results (Author response image 1) show that the averaged activity difference (0-650 ms after visual stimulus onset) between HA and NA was similar as the averaged activity difference between NA and NU trials, even though the stimulus contrast difference was much larger between HA and NA conditions than between NA and NU conditions. Such results indicate that the awareness-related activity in PFC cannot be solely explained by the contrast difference between NA and NU conditions. Based on these results, we think that it is not necessary to perform the analysis as reviewer’s comment “A solution to this would be to sub-select trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions”. Another reason that impedes us to do this analysis is due to the limited trial numbers in our dataset.

      Author response image 3.

      Relationship between stimulus contract and PFC activity. X axis represents the stimulus contrast difference between two paired conditions, i.e., aware versus unaware in near perceptual threshold conditions (NA – NU, red dots); aware in high contrast condition versus aware in near perceptual threshold condition (HA – NA, blue dots). Y axis represents the activity difference between paired stimulus conditions. The results show that activity difference is similar between two paired conditions regardless the remarkable contrast difference between two paired conditions. Such results indicate that the greater activity in NA trials than in NU trials (Fig. xx-xx) could not be interpreted by the slight difference in stimulus contrast between NA and NU trials.

      Related, Figure 7B is confusing and the results are puzzling. Why is there such a strong below chance decoding on the diagonal? (also even before stimulus onset) Please clarify the goal and approach of this analysis and also discuss/explain better what they mean.

      We have withdrawn Figure7B for the confusing decoding results on the diagonal.

      I was somewhat surprised by several statements in the paper and it felt that the authors may not be aware of several intricacies in the field of consciousness. For example, a statement like the following "Consciousness, as a high-level cognitive function of the brain, should have some similar effects as other cognitive functions on behavior (for example, saccadic reaction time). With this question in mind, we carefully searched the literature about the relationship between consciousness and behavior; surprisingly, we failed to find any relevant literature." This is rather problematic for at least two reasons. First, not everyone would agree that consciousness is a highlevel cognitive function and second there are many papers arguing for a certain relationship between consciousness and behavior (Dehaene and Naccache, 2001 Cognition; van Gaal et al., 2012, Frontiers in Neuroscience; Block 1995, BBS; Lamme, Frontiers in Psychology, 2020; Seth, 2008 and many more). Further, the explanation for the reaction time differences in this specific case is likely related to the fact that subjects' confidence in that decision is much higher in the aware trials than in the unaware trials, hence the speeded response for the first. This is a phenomenon that is often observed if one explores the "confidence literature". Although the authors have not measured confidence I would not make too much out of this RT difference.

      We agree that and modified accordingly in lines 492-507.

      ‘An alternative interpretation for RT difference between aware and unaware condition in our study, i.e., reflecting task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory).

      Another possibility is that the reaction time is strongly modulated by the confident level, which has been described in previous studies(Broggin et al., 2012; Marzi et al., 2006). However, in previous studies, the confident levels were usually induced by presenting stimulus with different physical property, such as spatial frequency, eccentricity and contrast. However, the dependence of visual process on the salience of visual stimulus confounds with the effect of visual awareness on the reaction time of responsive movements, which is hard to attribute the shorter reaction time in more salient condition purely to visual awareness. In contrast, we create a condition (near aware threshold) in the present study, in which the saliency (contrast) of visual stimulus is very similar in both aware and unaware conditions in order to eliminate the influence of stimulus saliency in reaction time. We think that the difference in reaction time in our study is mainly due to the modulation of awareness state, which was not reported previously.’

      I would be interested in a lateralized analysis, in which the authors compare the PFC responses and connectivity profiles using PLV as a factor of stimulus location (thus comparing electrodes contralateral to the presented stimulus and electrodes ipsilateral to the presented stimulus). If possible this may give interesting insights in the mechanism of global ignition (global broadcasting), supposing that for contralateral electrodes information does not have to cross from one hemisphere to another, whereas for ipsilateral electrodes that is the case (which may take time). Gaillard et al refer to this issue as well in their paper, and this issue is sometimes discussed regarding to Global workspace theory. This would add novelty to the findings of the paper in my opinion.

      We gratefully appreciate reviewer’s helpful and available suggestions. We have made the analysis accordingly. We find that the awareness-related ERP activation in PFC occurs earlier only in the contralateral PFC with latency about 200 ms and then occurs in both contralateral and ipsilateral PFC about 100 ms later. In addition, the magnitude of awareness-related activity is stronger in the contralateral PFC than in ipsilateral PFC during the early phase (200-400 ms), then the activity becomes similar between contralateral and ipsilateral PFC. Moreover, the awareness related HG activity only appears in the contralateral PFC. Such results show the spatiotemporal characteristics of visual awareness related activity between two hemispheres. We are going to report these results in a separate paper soon.

      Reviewer #3 (Recommendations For The Authors):

      Some of the font sizes in the figures are too small.

      We have modified accordingly.

      To me, the abbreviations are confusing, (NA/NU etc). I would try to come up with easier ones or just not use abbreviations.

      We have modified accordingly and try to avoid to use the abbreviations.

      The data/scripts availability statement states "available upon reasonable request". I would suggest that the authors make the data openly available when possible, and I believe eLife requires that as well.

      Thanks for reviewer’s suggestions. Due to several ongoing studies based on this dataset, we would like to open our data after complete these studies if there is no restriction from national policy.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Many drugs have off-target effects on the gut microbiota but the downstream consequences for drug efficacy and side effect profiles remain unclear. Herein, Wang et al. use a mouse model of liver injury coupled to antibiotic and microbiota transplantation experiments. Their results suggest that metformin-induced shifts in gut microbial community structure and metabolite levels may contribute to drug efficacy. This study provides valuable mechanistic insights that could be dissected further in future studies, including efforts to identify which specific bacterial species, genes, and metabolites play a causal role in drug response. Importantly, although some pilot data from human subjects is shown, the clinical relevance of these findings for liver disease remain to be determined.

      Thank you for reviewing our manuscript. We appreciate your valuable feedback. We agree that the downstream consequences of off-target effects on the gut microbiota by various drugs remain unclear. Our study aimed to shed light on this aspect by utilizing a mouse model of liver injury and conducting antibiotic and microbiota transplantation experiments. Our findings suggest that shifts in the structure and metabolite levels of the gut microbial community induced by metformin play a role in the drug’s efficacy. We believe that these mechanistic insights provide a strong foundation for further investigations. Specifically, future studies could focus on identifying the specific bacterial species, genes, and metabolites that have a causal role in drug response. While we have included some pilot data from human subjects, we acknowledge that the clinical relevance of our findings in the context of liver disease still requires further determination. In fact, we focused on the alteration of microbiota and metabolism caused by metformin in human bodies, which could capture the characteristics of changes in a more composite clinical direction, elucidating the potential role of metformin. We appreciate your attention to this aspect and thank you again for your thoughtful review and valuable suggestions.

      The major strength of this work is its scope, including detailed mouse phenotyping, inter-disciplinary methods, and numerous complementary experiments. The antibiotic depletion and FMT experiments provide support for a role of the gut microbiota in this mouse model.

      A major limitation is the lack of studies narrowing down which microbes are responsible. Sequencing data is shown, but no follow-up studies are done with bacterial isolates or defined communities.

      We acknowledge the limitation of our study in not narrowing down the specific microbes responsible for the observed effects. We hold the opinion that metformin exerts its effects through modulation of specific metabolic pathways unique to the microbial community. Previous study has shown that metformin can inhibit microbial folate metabolism, leading to longevity-promoting effects that are not attributed to a single colony or strain[1]. Similarly, the impact of metformin on amino acid metabolism in the microbial community appears to be widespread. While further investigations with bacterial isolates or defined communities are needed, our findings suggest that metformin's effects on microbial metabolism are complex and involve multiple members of the microbial community.

      The link to GABA is also somewhat tenuous. While it does match the phenotypic data, there are no targeted experiments in which GABA producing microbial communities/strains are compared to a control community/strain. As such, it seems difficult to know how much of the effects in this model are due to GABA vs. other metabolites.

      We agree with your point regarding the tenuous link to GABA in our study. While we did observe an increase in GABA as the only amino acid following metformin treatment, and this finding has not been reported previously, we acknowledge the need for targeted experiments comparing GABA-producing microbial communities/strains to control communities/strains. Previous literatures suggest that metformin's modulation of the microbiota can vary significantly depending on the disease context, with different microbial populations exhibiting differential responses[2-4]. Given this complexity, we opted to study the overall microbial community response to metformin rather than focusing on specific strains. Additionally, our detection of key enzymes involved in GABA synthesis at the community level further supports our findings.

      My major recommendation would be to revise the title, abstract, and discussion to provide more qualification and to consider alternative interpretations.

      We appreciate your feedback and understand your concern regarding the need for more qualification and consideration of alternative interpretations. We hope to have more specific and detailed suggestions you may have to enhance the clarity and qualification of our title and abstract. Furthermore, we have tried to revise discussion in order to enhance the scientific rigor and logical coherence of our study. If you have any specific recommendations or insights, we would be more than willing to make further revisions to address those concerns.

      Some key controls are also missing, which could be addressed by repeat experiments in the mouse model.

      We appreciate your suggestion to include additional key controls in the mouse model experiments. We have conducted repeat experiments to test the effect of antibiotics in the absence of metformin to differentiate between the effects of the model itself and the interaction of metformin with antibiotics. As results of liver injury indicators shown, there were no significance among Control, Control+Met, Control+FMT and Control+Abx groups, revealing that metformin and its treated feces, and antibiotics had no effect on liver function in normal mice (Figure 1).

      Author response image 1.

      Figure1 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level.

      The antibiotic depletion experiment would be improved by testing the effect of antibiotics in the absence of metformin, to see if the effect is just driven by the model itself as opposed to an interaction between metformin and antibiotics.

      For the antibiotic depletion experiment, we had used antibiotics (Abx) for the mice of modeling, and the survival rate and liver function detection suggested that Abx had no extra effect on liver, which demonstrated that the effect is just driven by the model itself as opposed to an interaction between metformin and antibiotics (Figure 2).

      Author response image 2.

      Figure2 a: Survival rate between IR and IR + Abx group; b: Serum ALT level; c: Serum AST level.

      References

      [1] CABREIRO F, AU C, LEUNG K Y, et al. Metformin Retards Aging in C. elegans by Altering Microbial Folate and Methionine Metabolism [J]. Cell, 2013, 153(1): 228-39.

      [2] LIANG H, SONG H, ZHANG X, et al. Metformin attenuated sepsis-related liver injury by modulating gut microbiota [J]. Emerg Microbes Infect, 2022, 11(1): 815-28.

      [3] SUN L, XIE C, WANG G, et al. Gut microbiota and intestinal FXR mediate the clinical benefits of metformin [J]. Nat Med, 2018, 24(12): 1919-29.

      [4] ZHAO H Y, LYU Y J, ZHAI R Q, et al. Metformin Mitigates Sepsis-Related Neuroinflammation via Modulating Gut Microbiota and Metabolites [J]. Frontiers in Immunology, 2022, 13:797312.

      Reviewer #2 (Public Review):

      The authors examine the use of metformin in the treatment of hepatic ischemia/reperfusion injury (HIRI) and suggest the mechanism of action is mediated in part by the gut microbiota and changes in hepatic ferroptosis. While the concept is intriguing, the experimental approaches are inadequate to support these conclusions.

      The histological and imaging studies were considered a strength and reveal a significant impact of metformin post-HIRI.

      Thank you for reviewing our paper titled “Gut microbiota-derived gamma-aminobutyric acid from metformin treatment reduces hepatic ischemia/reperfusion injury through inhibiting ferroptosis”. We appreciate your insightful comments and suggestions, which have provided valuable insights into improving the quality and credibility of my research. We agree with your assessment that the experimental approaches used in this study may have limitations in supporting the conclusions drawn, and we appreciate your recognition of the strength of our histological and imaging studies, which clearly demonstrate the impact of metformin post-HIRI.

      Weaknesses largely stem from the experimental design. First, use of the iron chelator DFO would be strengthened using the ferroptosis inhibitor, liproxstatin.

      Your suggestion to employ the ferroptosis inhibitor, liproxstatin, in addition to the iron chelator DFO is well-taken. Incorporating liproxstatin into our experimental setup would provide a more comprehensive understanding of the involvement of hepatic ferroptosis in the mechanism of action of metformin. Therefore, we employed liproxstatin to inhibit HIRI and detected some core indicators of liver injury. As figure 3 shown, liproxstatin can reduce liver injury, restore liver GSH level and inhibit Fe accumulation, suggesting that ferroptosis plays an important role in HIRI. We hope this modification will enhance the credibility of our conclusions.

      Author response image 3.

      Figure3 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level; d: Liver GSH level; e: Liver Fe level.

      Second, the impact of metformin on the microbiota is profound resulting in changes in bile acid, lipid, and glucose homeostasis. Throughout the manuscript no comparisons are made with metformin alone which would better capture the metformin-specific effects.

      Thank you for raising an important point regarding the impact of metformin on the microbiota and its potential effects on bile acid, lipid, and glucose homeostasis. It has well known that that the effects of metformin on normal blood glucose and lipid metabolism are minimal. Metformin primarily exerts its effects in cases of impaired glucose tolerance, which is why it is widely used for non-diabetic conditions. Regarding the changes in bile acid metabolism and chronic cholesterol and lipid elevation, these associations are typically observed in chronic liver disease models. Since our study focuses on an acute model of HIRI, we did not specifically investigate these changes.

      Lastly, the absence of proper controls including germ free mice, metformin treated mice, FMT treated mice, etc make it difficult to understand the outcomes and to properly reproduce the findings in other labs.

      Lastly, we acknowledge your concern regarding the absence of proper controls, including germ-free mice, metformin-treated mice, and FMT -treated mice. We understand that these controls are essential for robustly interpreting and reproducing our findings. Therefore, we have added a batch of experiments for verification. As results shown, there were no significance among Control, Control+Met, Control+FMT and Control+Abx groups, revealing that metformin and its treated feces, and antibiotics had no effect on liver function in normal mice (Figure 1). We hope the result of these controls could address your valid point and provide a more comprehensive framework for understanding the outcomes.

      Author response image 4.

      Figure1 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level.

      Overall, while the concept is interesting and has the potential to better understand the pleiotropic functions of metformin, the limitations with the experimental design and lack of key controls make it challenging to support the conclusions.

      We genuinely appreciate your constructive criticism and the time you have taken to evaluate my work. Your feedback has shed light on the limitations of our experimental design and the need for key controls, which we have addressed in revised manuscript. If you have any further recommendations or concerns, we would be more than willing to incorporate them into my future work.

      Reviewer #3 (Public Review):

      The study presented in this paper explores the role of gut microbiota in the therapeutic effect of metformin on HIRI, as supported by fecal microbiota transplantation (FMT) experiments. Through high throughput sequencing and HPLC-MS/MS, the authors have successfully demonstrated that metformin administration leads to an increase in GABA-producing bacteria. Moreover, the study provides compelling evidence for the beneficial impact of GABA on HIRI.

      Thank you for your valuable feedback on our paper exploring the role of gut microbiota in the therapeutic effect of metformin on hepatic ischemia-reperfusion injury (HIRI). We appreciate your positive remarks and suggestions for improvement. In response to your comments, we have revised the manuscript accordingly. We have included additional details on the high throughput sequencing and HPLC-MS/MS methods used to analyze the gut microbiota and GABA levels. This should provide readers with a clearer understanding of our experimental approach and the evidence supporting our findings.

      Regarding your suggestion to further investigate the mechanisms underlying the beneficial impact of GABA on HIRI, we agree that this is an important direction for future research. We plan to conduct additional studies to explore the specific mechanisms by which GABA exerts its protective effects on HIRI in the future. We also supplemented discussion of potential therapeutic strategies targeting GABAergic pathways in the discussion section.

      Thank you once again for your insightful comments. We believe that these revisions have strengthened the manuscript and improved its scientific rigor. We hope that you find the revised version to be satisfactory and look forward to your further feedback.

      Reviewer #1 (Recommendations For The Authors):

      The writing could be improved. Multiple typos are found throughout and there is an overuse of adverbs like "expectedly". You should let the reader decide what is or is not expected. Try to avoid terms like "confirmed" or "validated", which only applies if you knew the result a priori. Remove underscores in species names. The Results section is also very difficult to interpret given the lack of explanation of experimental design. For example, the human study is only briefly mentioned within a larger paragraph on mouse data, without any explanation as to the study design. Similar issues are true for the transcriptomics and amplicon sequencing - it would help the reader to explain what samples were processed, the timepoints, etc.

      Thank you for your valuable feedback on our manuscript entitled “Gut microbiota-derived gamma-aminobutyric acid from metformin treatment reduces hepatic ischemia/reperfusion injury through inhibiting ferroptosis” We appreciate your constructive comments and insightful suggestions for improvement.

      We have carefully reviewed your comments and have made several revisions to enhance the clarity and readability of the manuscript. We have addressed the issue of multiple typos and have removed the overuse of adverbs, such as “expectedly,” to allow readers to draw their own conclusions from the results. Additionally, we have eliminated terms like “confirmed” or “validated” that may imply a priori knowledge of the results.

      We apologize for the lack of clarity regarding the experimental design in the Results section. We have now provided a more detailed explanation of the study design for the human study, transcriptomics, and amplicon sequencing experiments. This includes information on the samples processed, timepoints, and other relevant details, to aid readers in understanding the experimental procedures.

      In response to your comment about removing underscores in species names, we have revised the text accordingly to ensure consistency and accuracy in the species nomenclature used throughout the manuscript.

      Once again, we sincerely appreciate your valuable input, which has helped us improve the quality of our manuscript. We hope that the revised version now meets your expectations and look forward to any further feedback you may have.

      Thank you for your time and attention.

      Line 53 - prebiotics aren't "microbial agents"

      We apologize for this error, which we have corrected. (line 55: “Microbial agents, such as synbioticsprebiotics and probiotics…”)

      Line 88 - sequencing doesn't "verify the critical role of gut microbiota"

      We apologize for this error, which we have corrected. (line 90: “In order to verifyclarify the critical role of gut microbiota in the pleiotropic actions of metformin,22-24 fecal samples were collected from the mice to perform 16S rRNA sequencing.

      Line 92 - missing a citation for the "microbiota-gut-liver axis theory"

      We have corrected it in manuscript. (line 93: “Next, as the microbiota-gut-liver axis theory indicates,25 HIRI-induced dysfunction of the gut barrier may aggravate liver damage by disrupting the gut microbiota.”)

      Line 112 - it's very surprising to me that FMT led to lower alpha diversity, which seems impossible.

      We understand your surprise regarding the observed decrease in alpha diversity after FMT. Our findings indeed deviate from the commonly observed pattern of increased alpha diversity post-FMT. We have carefully re-examined our data and conducted additional analyses to ensure the accuracy of our results. After thorough investigation, we have identified a potential reason for this unexpected outcome, which we believe could shed light on this phenomenon. We hypothesize that the lower alpha diversity observed in our study might be attributed to the specific characteristics of the donor microbiota used for FMT. While the donor microbiota exhibited certain beneficial properties associated with the therapeutic effect on HIRI, it could have presented a limited diversity compared to the recipient’s original gut microbiota. This discrepancy in diversity could have contributed to the observed decrease in alpha diversity following FMT.

      To further support our hypothesis, we have included a discussion on this unexpected finding in the revised manuscript. We believe that this addition will provide a more comprehensive understanding of the results and help contextualize the observed decrease in alpha diversity following FMT.

      Line 117 - Antibiotics don't "identify the function of gut microbes." Need to specify which antibiotics were used and for how long.

      We have corrected it in manuscript. (line 119: “To further identify the function of gut microbes, experiments were designed, and combination treatment of antibiotics (1 mg/mL penicillin sulfate, 1 mg/mL neomycin sulfate, 1 mg/mL metronidazole and 0.16 mg/mL gentamicin) and metformin were employed for 1 week before IR treated.”)

      Line 120 - this experiment shows that the gut microbiota (or antibiotics more precisely) matters, not the "reshaped gut microbiota"

      We have corrected it in manuscript. (line 124: “The results confirmed that reshaped gut microbiota is critical for the effect of metformin against HIRI.”)

      Line 122 - need to reword this subheading and the concluding sentence. The main takeaway is that the FMT improved markers of ferroptosis, but no additional causal links are provided here.

      We have revised in manuscript. (line 125: “FMT alleviates HIRI-induced ferroptosis through reshaped fecal microbiota.”)

      Line 141 - need to explain what transcriptomics data was generated and how it was analyzed.

      We have revised in manuscript. (line 144: “To elucidate the molecular mechanisms through which pathway participates metformin-treated IR injury, we analysed gene expression profiles of each group mice. Transcriptome sequencing analysis revealed that 9697 genes were in common among four groups (Supplementary Figure 6). Therefore, we used these common genes for KEGG analysis, showing that The transcriptome analysis of liver tissues showed that similar mRNA changes between Met group and FMT group are mainly concentrated in the three top pathways: lipid metabolism, carbohydrate metabolism, and amino acid metabolism (Fig 4a).”)

      Line 150 - change to "16S rRNA gene sequencing". Typo: "mice microbes".

      We have revised in manuscript. (line 156: “Moreover, it was observed that the genus of Bacteroides had a significant increase based on the 16s rRNA gene sequencing of metformin-treated mice microbes.”)

      Line 152 - upregulated refers to gene expression, change to enriched.

      We have revised in manuscript. (line 171: “Detailedly, the species of Bacteroides containing Bacteroides thetaiotaomicron, Bacteroides unifomis, and Bacteroides salyersiae, were enriched in human gut after metformin administration (Fig. 4i).”)

      Line 159 - typo: "prokaryotes"

      We have revised in manuscript. (line 165: “In order to further identify the increased GABA originates from gut microbiota, two key enzymes of prokaryotes protokaryotic GABA synthesis, GAD and PAT, were detected on DNA level, finding that both of them are significantly increased in the feces from IR+Met and IR+FMT groups (Fig. 4h).”)

      Line 161 - the human study should be under a new sub-heading and provide more details.

      We have revised in manuscript. (line 168: In order to clarify the specific effects of metformin on microbiota, given the big safety margin, healthy volunteers were recruited for a 1 week of daily oral 500mg dose of metformin trial. Fecal samples were collected before and after oral administration of metformin for metagenomic analysis .”)

      Line 197 - It's unclear why the current study conflicts with prior literature. Is it due to the disease model, the starting microbiota, something else? Please add more discussion.

      Thank you for bringing this important point to our attention, and we appreciate your valuable input. We agree that it is important to discuss the potential reasons for the discrepancy between our findings and prior literature on metformin-reshaped microbiota. In our study, we used a disease model of HIRI, which may have unique characteristics compared to other disease models. It is possible that the specific disease model influenced the response of the gut microbiota. Additionally, the starting microbiota of the recipients and the characteristics of the donor microbiota used for FMT could also play a role in the disparity. We have expanded the discussion section of our revised manuscript to further address these potential factors and their implications. We hope that this additional information will provide a more comprehensive explanation for the discrepancy between our study and prior literature.

      Figure 1a - change to Kaplan Meier not ANOVA. Specify the contrast - which groups are being compared?

      We have revised in Figure 1a.

      Figure 1e, alpha diversity - relabel "sobs" with "observed OTUs". Change to 3 bars with error and add statistics.

      We have revised in Figure 1e.

      Figure 1e, PCA - this should be a separate panel (1f). Color of big red circle doesn't match the points. Add PERMANOVA p-value/R2. Change to OTUs not genera. Better yet, use amplicon sequence variants from DADA2.

      We have revised in Figure 1e..

      Figure 2a - Change to Kaplan Meier. Also, it's unclear if residual metformin could be in the donor samples.

      We have revised in Figure 2a.

      Figure 2f, alpha diversity - relabel "sobs" with "observed OTUs". Change to 3 bars with error and add statistics.

      We have revised in Figure 2f.

      Figure 2f, PCA - this should be a separate panel (2g). Color of big orange circle doesn't match the points. Add PERMANOVA p-value/R2. Change to OTUs not genera. Better yet, use amplicon sequence variants from DADA2.

      We have revised in Figure 2f.

      Figure 4b - check units, shouldn't this be ng/mg (i.e. weight not volume).

      We have revised in Figure 4b.

      Figure 4c,d - need more explanation in the legend and Results as to what is shown here.

      We have revised in Figure 4c,d.

      Figure 4d - unclear why only Bacteroides are shown here or if the p-values are adjusted for multiple comparisons.

      Thank you for your comment regarding Figure 4d in our manuscript. We apologize for the confusion caused. The reason why only Bacteroides is shown in Figure 4d is because we specifically wanted to investigate the changes in Bacteroides abundance following metformin treatment.

      In the mouse experiments, we observed a significant increase in Bacteroides after metformin treatment. To investigate if a similar change occurs in healthy volunteers, we examined the levels of Bacteroides in fecal samples before and after oral administration of metformin. We found that the abundance of Bacteroides also increased in the human gut after metformin administration, consistent with the results from the animal experiments. Regarding the p-values, we apologize for not mentioning whether they were adjusted for multiple comparisons in the figure legend. In our revised manuscript, we have provided a clarification stating that the p-values were adjusted using the appropriate method. We appreciate your feedback and hope that this explanation clarifies the rationale behind Figure 4d. Thank you for your valuable input.

      Reviewer #2 (Recommendations For The Authors):

      Below I've listed several suggestions to improve the paper.

      1. Controls - the authors should include metformin only treated mice, FMT only treated mice, etc. Additionally, germ free mice treated with metformin and HIRI would be helpful to better implicate the gut microbiome in these beneficial effects.

      Thank you for your suggestion regarding the inclusion of additional control groups in our study. We agree that including metformin only treated mice, FMT only treated mice, and germ-free mice treated with metformin and HIRI would provide valuable insights into the role of the gut microbiome in the observed beneficial effects.

      Therefore, we have included metformin only treated mice, FMT only treated mice and Abx only treated mice as supplement to better assess the specific contribution to the observed effects. As results shown, there were no significance among Control, Control+Met, Control+FMT and Control+Abx groups, revealing that metformin and its treated feces, and antibiotics had no effect on liver function in normal mice (figure1).

      We appreciate your input and believe that the inclusion of these additional control groups will strengthen our study and provide a more comprehensive understanding of the role of the gut microbiome in the therapeutic effects observed.

      Author response image 5.

      Figure1 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level.

      1. More thorough characterization of metabolite pools. Metformin is known to influence many pathways including bile acids and lipids. These important molecules should be measures as they likely play a key role in the observed protective effect. In fact, many of the key changes displayed in Figure 3H are involved in lipid metabolism.

      Thank you for your valuable feedback regarding the characterization of metabolite pools in our study. We appreciate your suggestion to measure the influence of metformin on bile acids and lipid metabolism, as they are crucial pathways that may play a significant role in the observed protective effect.

      Regarding bile acids, we agree that they are important in the context of metformin’s influence on metabolic pathways. However, it is important to note that the impact of metformin on bile acids appears to be more prominent in chronic liver disease models. In our acute model, the changes in bile acids were not as significant. Instead, our results primarily indicate a close association between lipid changes and hepatic ferroptosis. Metformin significantly modulates lipid metabolism, thereby alleviating liver ferroptosis.

      Additionally, we have conducted metagenomic sequencing on the gut microbiota of healthy volunteers before and after oral administration of metformin. While analyzing the data, we did not observe significant changes in key genes involved in regulating bile acid variations. This might be attributed to the healthy volunteers used in our study, where significant changes in bile acids were not induced.

      We appreciate your insightful comments and suggestions, which have shed light on the importance of characterizing bile acids and lipid metabolism in our study. While the impact of bile acids may be more evident in chronic liver disease models, our findings highlight the significant influence of metformin on lipid metabolism, closely related to hepatic ferroptosis. We will take your suggestions into account for future studies to further explore the role of bile acids and their regulation by metformin.

      1. Imaging of lipid ROS is not quantitative. The authors should conduct more standard assays with BODIPY 581/591 C11 using cell lysates.

      We appreciate your suggestion to conduct more standard assays using BODIPY 581/591 C11 with cell lysates.

      We would like to clarify that we did indeed utilize assays with BODIPY 581/591 C11 to detect and measure lipid ROS in our study. The detailed description of these assays can be found in the Methods section of our paper. We followed established protocols and guidelines to ensure accurate and reliable measurements of lipid ROS levels.

      We acknowledge that imaging techniques may have limitations in providing quantitative data. However, we employed BODIPY 581/591 C11 assays as a widely accepted and commonly used method to assess lipid ROS levels. This allowed us to obtain qualitative and semi-quantitative information on the changes in lipid ROS levels in response to metformin treatment.

      1. Liproxstatin may be a better drug choice or at the very least should be used to compare with the DFO data

      Thank you for your suggestion. We have taken your advice into consideration and conducted an evaluation of Liproxstatin as a ferroptosis inhibitor. Our findings indicate that Liproxstatin significantly improves HIRI (Figure C). We believe that incorporating Liproxstatin in our research will provide valuable insights and allow for a comprehensive comparison with the DFO data.

      Author response image 6.

      Figure3 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level; d: Liver GSH level; e: Liver Fe level.

      1. The rationale for how GABA was selected is not clear. I am surprised that there were not more significant metabolite changes. It might be better to show a volcano plot of heatmap of the significantly changed features.

      Thank you for raising an important question regarding the rationale for selecting GABA as the focus metabolite in our study. Initially, we also had concerns about the limited number of significant metabolite changes observed. However, through our comprehensive metabolomic profiling, we identified GABA as the most significantly altered metabolite following HIRI.

      It is worth noting that we specifically focused on the measurement of 22 essential amino acids in our analysis. While it is possible that changes in non-essential amino acids may have occurred, we did not examine them in this study. Nevertheless, we have since used additional methods to validate the upregulation of GABA levels, and the biological effects observed support the specific role of GABA in protecting against HIRI. Based on the fact that GABA was the only significant amino acid, the volcano plot was of little significance, so we did not supplement this plot.

      We appreciate your valuable input and thank you for bringing up this important issue.

      1. The manuscript needs to be proofread and edited. There are a variety of typos and grammar issues throughout.

      Thank you for your feedback. We acknowledge that the manuscript requires proofreading and editing, as we have identified several typos and grammar issues. We will try to ensure that the necessary revisions are made to improve the overall quality of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      However, I have some major concerns for the manuscript.

      1. Line 26 16S rRNA and metagenomic sequencing alone can't accurately confirm the improvement effect of GABA producing bacteria on HIRI. In fact, transcriptome analysis, HPLC-MS/MS and other methods were also used in this paper, so the language expression here is not appropriate

      Thank you for pointing out the language expression issue in line 26 of the manuscript. We apologize for any confusion caused. You are correct in stating that 16S rRNA and metagenomic sequencing alone may not accurately confirm the improvement effect of GABA-producing bacteria on HIRI. In our study, we employed a combination of multiple methods, including transcriptome analysis, HPLC-MS/MS, especially detection of bacteria GABA key synthetases, PAT and GAD, to comprehensively investigate the impact of GABA-producing bacteria on HIRI.

      We have revised the language in line 26 to reflect the broader range of methods used in our study to support the conclusions regarding the improvement effect of GABA-producing bacteria on HIRI.

      1. The Introduction section needs to add a description of the previous research on the association between HIRI and ferroptosis

      Thank you for your suggestion regarding the inclusion of a description of the association between HIRI and ferroptosis in the Introduction section. We agree that this is an important aspect to address. However, upon further consideration, we have decided to move the discussion of ferroptosis and its potential role in HIRI to the Discussion section, as it aligns better with the logical flow of the manuscript. This allows us to discuss the potential implications and future directions in a more organized and coherent manner.

      1. Authors should provide quantified figure or table next to the results of western blot that are more convenient to understand.

      We have revised in manuscript. (See sfigure 7)

      1. In this paper, FMT experiments are used to verify that metformin remodeled gut microbiota can play a role in improving HIRI. The operation steps of FMT should be described more specifically in the method part

      *What is the fecal donor information for FMT?

      *Line272 Did the IR + FMT group put the transplanted microbiota of FMT directly into the drinking water like the other treatment groups? Will such an operation affect the quality and quantification of the transplanted microbiota and lead to the loss of microbiota species? It is crucial for the authors to provide a clear and thorough clarification regarding these matters within the context of their FMT experiment.

      Thank you for your feedback regarding the need for a more detailed description of the fecal microbiota transplantation (FMT) procedure and clarification regarding the IR + FMT group in our manuscript. We appreciate your suggestions and we have taken them into consideration.

      In our study, the fecal donor for FMT was obtained from mice that had been orally administered metformin. The fecal microbiota was collected and processed to remove any residual metformin before transplantation. Specifically, the microbiota for the IR + FMT group was administered through gavage, as stated in line 272. This method does not affect the quality or quantity of the transplanted microbiota, nor does it lead to a loss of microbiota species. We understand the importance of providing clear and thorough clarification regarding these matters. Therefore, we have included additional specific details of the FMT procedure in the revised version of the manuscript. We hope that this clarification addresses your concerns and provides a more comprehensive understanding of our FMT experiment.

      1. The presentation of transcriptomic analysis results in the manuscript is insufficiently comprehensive and specific, as they are solely depicted through Fig 4a. Relying solely on Fig 4a is inadequate to establish the definitive roles of the met group and FMT group in ferroptosis compared to other groups. Therefore, the authors should provide additional transcriptomic analysis results to ascertain the specific effects of the met group and FMT group in ferroptosis, as well as their comparison with other groups.

      Thank you for your feedback regarding the comprehensiveness of our transcriptomic analysis results in the manuscript. We understand your concerns and appreciate your suggestion. In our study, we have provided additional data beyond Fig 4a to support the specific effects of the met group and FMT group in ferroptosis, as well as their comparison with other groups. Specifically, in Figure 3, we have included Western blot (WB) and quantitative real-time polymerase chain reaction (qRT-PCR) data to confirm the involvement of ferroptosis in HIRI and the role of metformin in attenuating ferroptosis. Moreover, we have presented transcriptomic analysis results in Figure 3h, which includes a heatmap of genes related to lipid metabolism. These findings can strengthen our conclusions regarding the importance of ferroptosis in HIRI and the protective effects of metformin against ferroptosis. We hope that these data address your concerns and provide a more comprehensive understanding of our research findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study provides compelling evidence to explain how chemical variations within a set of kinase inhibitors drive the selection of specific Erk2 conformations. Conformational selection plays a critical role in targeting medically relevant kinases such as Erk2 and the findings reported here open new avenues for designing small molecule inhibitors that block the active site while also steering the population of the enzyme into active or inactive conformations. Since protein dynamics and conformational ensembles are essential for enzyme function, this work will be of broad interest to those working in drug development, signal transduction, and enzymology.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: The authors set out to determine how chemical variation on kinase inhibitors determines the selection of Erk2 conformations and how inhibitor binding affects ERk2 structure and dynamics.

      Strengths: The study is beautifully presented both verbally and visually. The NMR experiments and the HDX experiments complement each other for the study of Erk2 solution dynamics. X-ray crystallography of Erk2 complexes with inhibitors shows small but distinct structural changes that support the proposed model for the impact of inhibitor binding.

      Weaknesses: A discussion of compound residence time for the different compounds and kinase constructs and how it could affect the very slow HDX rates might be helpful. For example, could any of the observed effects in Figure 4 be due to slow compound dissociation rather than slowed down kinase dynamics? What would be the implications?

      Response: Rate constants for kon and koff were estimated for three inhibitors using surface plasmon resonance:

      Author response table 1.

      SPR estimates of Kd for selected inhibitors ranged between 0.03-3 nM. All HDX time courses involved prebinding of 20 µM inhibitor and 17 µM ERK2 for 30 min (predicted occupancy 99.9%), followed by deuteration time courses with 20 µM inhibitor and 1.7 µM ERK2. Estimated rates of dissociation were ~0.0003-0.007 s-1 and rates of binding were 20-100 s-1 for the inhibitors tested. Because the binding rates are faster than the intrinsic H-D exchange rate at pD 7 (~1 s-1), we expect ligands to rebind and form the enzyme:ligand complex faster than the free enzyme undergoes exchange. Therefore, HDX rates should mostly reflect deuteration of the inhibitor-bound enzyme for all inhibitors.

      Reviewer #2 (Public Review):

      Erk2 is an essential element of the MAP kinase signaling cascade and directly controls cell proliferation, migration, and survival. Therefore, it is one of the most important drug targets for cancer therapy. The catalytic subunit of Erk2 has a bilobal architecture, with the small lobe harboring the nucleotide-binding pocket and the large lobe harboring the substrate-binding cleft. Several studies by the Ahn group revealed that the catalytic domain hops between (at least) two conformational states: active (R) and inactive (L), which exchange in the millisecond time scale based on the chemical shift mapping. The R state is a signature of the double phosphorylated Erk2 (2P-Erk2), while the L state has been associated with the unphosphorylated kinase (0P-Erk2). Interestingly, the X-ray structures reveal only minimal differences between these two states, a feature that led to the conclusion that active and inactive states are structurally similar but dynamically very different. The Ahn group also found that ATP-competitive inhibitors can steer the populations of Erk2 either toward the R or the L state, depending on their chemical nature. The latter opens up the possibility of modulating the activity of this kinase by changing the chemistry of the ATP-competitive inhibitor. To prove this point, the authors present a set of nineteen compounds with diverse chemical substituents. From their combined NMR and HDX-Mass Spec analyses, fourteen inhibitors drive the kinase toward the R state, while four compounds keep the kinase hopping between the R and L states. Based on these data, the authors rationalize the effects of these inhibitors and the importance of the nature of the substituents on the central scaffold to steer the kinase activity. While all these inhibitors target the ATP binding pocket, they display diverse structural and dynamic effects on the kinase, selecting a specific structural state. Although the inhibited kinase is no longer able to phosphorylate substrates, it can initiate signaling events functioning as scaffolds for other proteins. Therefore, by changing the chemistry of the inhibitors it may be possible to affect the MAP cascade in a predictable manner. This concept, recently introduced as proof of principle, finds here its significance and practical implications. The design of the next-generation inhibitors must be taken into account for these design principles. The research is well executed, and the data support the author's conclusions.

      Reviewer #3 (Public Review):

      Summary: Anderson et al utilize an array of orthogonal techniques to highlight the importance of protein dynamics for the function and inhibition of the kinase ERK2. ERK2 is important for a large variety of biological functions.

      Strengths: This is a thorough and detailed study that uses a variety of techniques to identify critical molecular/chemical parameters that drive ERK2 in specific states.

      Weaknesses: No details rules were identified so that novel inhibitors could be designed. Nevertheless, the mode of action of these existing inhibitors is much better defined.

      Response: As recommended we added a sentence to the Discussion suggesting that inhibitors that perturb the β1-β2-β3 sheet in such a way that moves helix αC and αL16 away from the binding site might confer R-state selection. We view this as a preliminary model for predicting conformation selection in ERK2.

      Reviewer #1 (Recommendations For The Authors):

      Maybe the authors can comment on how the HDX timescale and the NMR timescale relate to each other and how such different timescales can report on the same event. In particular, the HDX timescale appears to be on the scale on minutes to tens hours (e.g. 2P state). How would inhibitor dissociation and rebinding affect the observed HDX signal? Is it worth considering compound residence time for the different compounds/kinase states?

      Response: The HDX-MS and NMR experiments report different processes therefore their timescales do not necessarily match. For native state proteins at neutral pH, HDX-MS reports fluctuations that allow solvent exposure of backbone amide N-H, reflecting conformational mobility of the main chain. This is often modeled as a two-state interconversion between “closed” (HDX protected) and “open” (HDX accessible) states. Because the µs-ms timescale of main chain fluctuations is faster than the intrinsic rate of HDX (kexch, ~1 s-1), the observed HDX rate (kobs) can be approximated by the ratio of kopen/kclosed x kexch = Kop x kexch. Therefore, kobs can be considered a thermodynamic measurement that reflects Kop.

      The [methyl 13C,1H] NMR CPMG experiment that we used to identify global exchange behavior in Xiao et al (PNAS, 2014) modeled the 2P-ERK2 apoenzyme by a two-state equilibrium (L⇌R) between methyl-ILV conformers, yielding rate constants kL→R 240 s-1 and kR→L 60 s-1. Some methyls had large enough chemical shifts between L and R that they appeared as separate peaks in HMQC spectra that matched the L and R populations estimated by CPMG. In this study, the HMQC peaks shown in Figures 1, 6, and 9 are those that report shifts in L vs R populations and conformation selection for the R-state by VTX11e, BVD523 and triazolopyridine inhibitors.

      Where HDX and NMR agree is in their ability to report changes in populations of L and R in 2P-ERK2. This was first shown when both HDX and NMR measurements reported perturbations at the activation loop induced by inhibitors with differential selection for the R- vs L-states (Pegram et al. PNAS, 2019). CPMG measurements then confirmed that methyl probes in the activation loop are included in the global exchange process (Iverson et al., Biochemistry, 2020). Therefore, the HDX and NMR experiments reflect shifts in the equilibrium between L and R conformers, rather than motions with specific timescales.

      Reviewer #2 (Recommendations For The Authors):

      I believe the paper is suitable for the special issue of Elife dedicated to protein kinases after the authors address minor concerns/comments.

      a) Introduction, page 3: "[..] But within the ATP binding site, the conserved residues ...are largely overlapping." Do the authors mean that the residues are overlapping in the X-ray structures? If so, what is the rmsd among the X-ray structures?

      Response: The overlap between conserved residues K52, E69, D147, N152 and D165 in 2P- and 0P-ERK2 is presented in Fig. S1C, which shows an overlay between their apoenzyme crystal structures (PDBID: 2ERK, 5UMO). The RMSD of atoms in each residue are: K52 0.63 Å (9 atoms); E69 0.15 Å (9 atoms); D147 0.055 Å (8 atoms); D165 0.88 Å (8 atoms). As recommended, this information was added to the legend to Suppl. Fig. S1.

      b) Introduction, page 5: "[...] For example binding of VTX11 partially inhibits...[..]" Please provide a citation.

      Response: As recommended we added a citation at end of this sentence (Pegram et al. PNAS, 2019).

      c) Introduction, page 5: "[...] N-lobe deformities..." What do the authors mean by deformities? Are there frustrated conformations?

      Response: We used the term “deformities” to mean conformational differences, which may be but are not necessarily due to frustration. To avoid confusion, we removed the term “deformities” and replaced it with “conformational changes”.

      d) Supplementary Information. The authors report the chemical shift perturbations for several inhibitors. Does the extent of the chemical shift perturbation reflect the strength of the binding for each inhibitor? In other words, do the largest chemical shift perturbations correspond to the highest binding affinity?

      Response: The concentrations used in the NMR ligand binding experiments (150 µM ERK2, 180 µM inhibitor) allow 99.9+% complex formation over the 0.03 - 3 nM range of Ki for all inhibitors. Therefore, the chemical shifts report changes in electronic environment between bound and free enzyme. These can be ascribed to first or second sphere contacts with ligand or distal allosteric effects. But they are not likely to reflect differences in binding affinity.

      New Suppl. Fig. S3 now adds HMQC titrations of VTX11e and GDC0994 into 2P-ERK2, which confirm binding saturation based on the disappearance of free enzyme peaks.

      e) Do the authors have any evidence for the dynamic effects of the different inhibitors? Of course, a systematic analysis of the protein dynamics by NMR will require a significant amount of time and effort beyond this work. However, did the authors measure the effects of the inhibitors on the linewidths of the methyl groups distal from the binding site?<br /> Response: As recommended, we examined linewidths of selected peaks in the presence and absence of inhibitors. The results show no significant systematic differences between bound and free ERK2. Therefore dynamic effects of different inhibitors are not indicated by the available data.

      f) The authors identified the b3-aC loop as a critical element for the internal network of interactions. Can this structural element be targeted by small molecules as well?

      Response: Yes, in fact the X-ray structures of 0P-ERK2 bound to the inhibitor, SCH772984, and 2P-ERK2 bound to the related compound, SCHCPD336, both show inhibitor occupying a pocket between between strand β3 and helix αC, leading to disruption of β3-αC contacts (Chaikaud et al., NSMB 2014; Pegram et al., PNAS 2019). To the extent that β3-αC contacts are important for conformation selection to the R-state, this may explain why SCH772984 favors the L-state. We revised the Discussion to add this point.

      g) The authors should mention a recent paper suggesting that it is possible to control substrate-binding affinity by changing the nature of the ATP-binding inhibitors ((DOI: 10.1126/sciadv.abo0696).

      Response. As recommended we added this point and citation to the Discussion.

      Reviewer #3 (Recommendations For The Authors):

      3.1. The manuscript is well written, but very long and sometimes repetitive. Some parts of the introduction are repeated in the result section and parts of the result section are repeated in the discussion. It will be easy to shorten the work to make it easier to read.

      Response: As recommended we streamlined the Discussion to remove some of the repetitive elements, while trying to retain the main conclusions and rationale for readers who are not well versed in kinase structure.

      3.2. Only specific residues are shown for the NMR spectra figures - while this is helpful to understand the concept, full spectra need to be shown to allow for direct comparison of the data quality (i.e. in supplemental material). If statements are made that measurements are done under full saturation - it should be shown that saturation is achieved in the measurements. All relaxation data should be made available - similar to CSPs.

      Response: As recommended, new Suppl. Figs. S2 and S9 were added to show the full spectra of each inhibitor complex analyzed by NMR. New Suppl. Fig. S3 now adds titrations of 2P-ERK2 with VTX11e and GDC0994.The results confirm binding saturation based on the disappearance of free enzyme peaks.

      3.3. No validation report was provided, nor a PDB number - so it is unclear if the crystal structures have been submitted - they need to be submitted in order to also access an mtz file, which is critical to understanding the quality of the structure (especially the ligand). This makes it difficult to assess the quality of the structures.

      Response: Table S1 has been revised to show data collection and refinement parameters for PDBID: 8U8K (2PERK2:Inh#8, Fig. 8C) and 8U8J (2P-ERK2:Inh#16, Fig. 8D). RCSB validation reports are attached and PDB depositions have been approved and will be released upon VOR assignment.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Goetz et al. takes a new perspective on sensory information processing in cells. In contrast to previous studies, which have used population data to build a response distribution and which estimate sensory information at about 1 bit, this work defines sensory information at the single cell level. To do so, the authors take two approaches. First, they estimate single cells' response distributions to various input levels from time-series data directly. Second, they infer these single-cell response distributions from the population data by assuming a biochemical model and extracting the cells' parameters with a maximum-entropy approach. In either case, they find, for two experimental examples, that single-cell sensory information is much higher than 1 bit, and that the reduction to 1 bit at the population level is due to the fact that cells' response functions are so different from each other. Finally, the authors identify examples of measurable cell properties that do or do not correlate with single-cell sensory information.

      The work brings an important and distinct new insight to a research direction that generated strong interest about a decade ago: measuring sensory information in cells and understanding why it is so low. The manuscript is clear, the results are compelling, and the conclusions are well supported by the findings. Several contributions should be of interest to the quantitative biology community (e.g., the demonstration that single cells' sensory information is considerably larger than previously implied, and the approach of inferring single-cell data from population data with the help of a model and a maximum-entropy assumption).

      We thank the reviewer for the excellent summary of our research.

      Reviewer #2 (Public Review):

      In this paper the authors present an existing information theoretic framework to assess the ability of single cells to encode external signals sensed through membrane receptors.

      The main point is to distinguish actual noise in the signaling pathway from cell-cell variability, which could be due to differences in their phenotypic state, and to formalize this difference using information theory.

      After correcting for this cellular variability, the authors find that cells may encode more information than one would estimate from ignoring it, which is expected. The authors show this using simple models of different complexities, and also by analyzing an imaging dataset of the IGF/FoxO pathway.

      The implications of the work are limited because the analysed data is not rich enough to draw clear conclusions. Specifically,

      • the authors do not distinguish what could be methodological noise inherent to microscopy techniques (segmentation etc), and actual intrinsic cell state. It's not clear that cell-cell variability in the analyzed dataset is not just a constant offset or normalization factor. Other authors (e.g. Gregor et al Cell 130, 153-164) have re-centered and re-normalized their data before further analysis, which is more or less equivalent to the idea of the conditional information in the sense that it aims to correct for this experimental noise.

      We thank the reviewer for the comment. However, we do not believe our analysis is a consequence of normalization artifacts. Prior to modeling the single cell data, we removed well-dependent background fluorescence. This should take care of technical variation related to overall offsets in the data. We agree with the reviewer that background subtraction may not fully account for technical variability. For example, some of the cell-to-cell variability may potentially be ascribed to issues such as incorrect segmentation. Unfortunately, however, attempting to remove this technical variability through cell-specific normalization as suggested by the reviewer1 will diminish to a very large extent the true biological effects related to extensivity (cell size, total protein abundance). We note that these effects are a direct function of cell state-variables (see for example Cohen-Saidon et al.2 who use cell-state specific normalization to improve signaling fidelity). Therefore, an increase in mutual information after normalization does not only reflect removal of technical noise but also accounts for effect of cell state variables.

      Nonetheless, as the reviewer suggested, we performed a cell-specific normalization wherein the mean nuclear FoxO levels in each cell (in the absence of IGF) were normalized to one. Then, for each ligand concentration, we collated FoxO response across all cells and computed the channel capacity corresponding to cell-state agnostic mutual information ICSA. As expected, ICSA increases from ∼0.9 bits to ∼1.3 bits when cell-specific normalization was performed (Author response image 1). However, this value is significantly lower than the average ∼1.95 of cell-state specific mutual information ⟨ICee⟩. Finally, we note that the cell specific normalization does not change the calculations of channel capacity at the single cell level as these calculations do not depend on linear transformations of the data (centering and normalization). Therefore, we do not think that our analysis of experimental data suffers from artifacts related to microscopy.

      Author response image 1.

      Author response image 1. Left: nuclear FoxO response averaged over all cells in the population across different ligand concentration. Right: nuclear FoxO response was first normalized at the single cell level and then averaged over all cells in the population across different ligand concentrations.

      • in the experiment, each condition is shown only once and sequentially. This means that the reproducibility of the response upon repeated exposures in a single cell was not tested, casting doubt on the estimate of the response fidelity (estimated as the variance over time in a single response).

      The reviewer raises an excellent question about persistence of cell states. To verify that cell states are indeed conserved at the time scale of the experiment, we reanalyzed data generated by Gross et al.3 wherein cells were perturbed with IGF (37.5 pM), followed by a washout which allowed the cells to reach pre-stimulation nuclear FoxO levels, followed by a re-perturbation with the same amount of IGF. Nuclear FoxO response was measured at the single cell level after 90 minutes with IGF exposure both these times. Since the response x to the same input u was measured twice in the same cell (x1 and x2), we could evaluate the intrinsic variability in response at the single cell level. We then compared this intrinsic variability to the extrinsic cell-state dependent variability in the population.

      To do so, we computed for each cell δ=x1-x2 the difference between the two responses. reviewer Figure 2 show the histogram p(δ) as computed from the data (pink) and the same computed from the model that was trained on the single cell data (blue). We also computed p(δ0) which represented the difference between responses of two different cells both from the data and from the model.

      As we see in Author response image 2, the distribution p(δ) is significantly narrower than p(δ0) suggesting that intracellular variability is significantly smaller than across-population variability and that cells’ response to the same stimuli are quite conserved, especially when compared to responses in randomly picked pairs of cells. This shows that cell states and the corresponding response to extracellular perturbations are conserved, at least at the time scale of the experiment. Therefore, our estimates of cell-to-cell variability signaling fidelity are stable and reliable. We have now incorporated this discussion in the manuscript (lines 275-281).

      Author response image 2.

      Author response image 2. Left: Cells were treated with 37.5 pM of IGF for 90 minutes, washed out for 120 minutes and again treated with 37.5 pM of IGF. Nuclear FoxO was measured during the treatment and the washout. The distributions on the left show the difference in FoxO levels in single cells after the two 90 minutes IGF stimulations (pink: data, blue: model). Right: Distribution of difference in FoxO levels in two randomly picked cells after 90 minutes of exposure to 37.5 pM IGF.

      • another dataset on the EGF/EGFR pathway is analyzed, but no conclusion can be drawn from it because single-cell information cannot be directly estimated from it. The authors instead use a maximum-entropy Ansatz, which cannot be validated for lack of data.

      We thank the reviewer for this comment. We agree with the reviewer that we have not verified our predictions for the EGF/EGFR pathway. That study was meant to show the potential generality of our analysis. We look forward to validating our predictions for the EGF/EGFR pathway in future studies.

      Reviewer #3 (Public Review):

      Goetz, Akl and Dixit investigated the heterogeneity in the fidelity of sensing the environment by individual cells in a population using computational modeling and analysis of experimental data for two important and well-studied mammalian signaling pathways: (insulin-like growth factor) IGF/FoxO and (epidermal growth factor) EFG/EFGR mammalian pathways. They quantified this heterogeneity using the conditional mutual information between the input (eg. level of IGF) and output (eg. level of FoxO in the nucleus), conditioned on the "state" variables which characterize the signaling pathway (such as abundances of key proteins, reaction rates, etc.) First, using a toy stochastic model of a receptor-ligand system - which constitutes the first step of both signaling pathways - they constructed the population average of the mutual information conditioned on the number of receptors and maximized over the input distribution and showed that it is always greater than or equal to the usual or "cell state agnostic" channel capacity. They constructed the probability distribution of cell state dependent mutual information for the two pathways, demonstrating agreement with experimental data in the case of the IGF/FoxO pathway using previously published data. Finally, for the IGF/FoxO pathway, they found the joint distribution of the cell state dependent mutual information and two experimentally accessible state variables: the response range of FoxO and total nuclear FoxO level prior to IGF stimulation. In both cases, the data approximately follow the contour lines of the joint distribution. Interestingly, high nuclear FoxO levels, and therefore lower associated noise in the number of output readout molecules, is not correlated with higher cell state dependent mutual information, as one might expect. This paper contributes to the vibrant body of work on information theoretic characterization of biochemical signaling pathways, using the distribution of cell state dependent mutual information as a metric to highlight the importance of heterogeneity in cell populations. The authors suggest that this metric can be used to infer "bottlenecks" in information transfer in signaling networks, where certain cell state variables have a lower joint distribution with the cell state dependent mutual information.

      The utility of a metric based on the conditional mutual information to quantify fidelity of sensing and its heterogeneity (distribution) in a cell population is supported in the comparison with data. Some aspects of the analysis and claims in the main body of the paper and SI need to be clarified and extended.

      1. The authors use their previously published (Ref. 32) maximum-entropy based method to extract the probability distribution of cell state variables, which is needed to construct their main result, namely p_CeeMI (I). The salient features of their method, and how it compares with other similar methods of parameter inference should be summarized in the section with this title. In SI 3.3, the Lagrangian, L, and Rm should be defined.

      We thank the reviewer for the comment and apologize for the omission. We have now rewritten the manuscript to include references to previous reviews of works that infer probability distributions4 of cell state variables (lines 156-168). Notably, as we argued in our previous work5, no current method can efficiently estimate the joint distribution over parameters that is consistent with measured single cell data and models of signaling networks. Therefore, we could not use multiple approaches to infer parameter distributions. We have now expanded our discussion of the method in the supplementary information sections.

      1. Throughout the text, the authors refer to "low" and "high" values of the channel capacity. For example, a value of 1-1.5 bits is claimed to be "low". The authors need to clarify the context in which this value is low: In some physically realistic cases, the signaling network may need to simply distinguish between the present or absence of a ligand, in which case this value would not be low.

      We agree with the reviewer that small values of channel capacities might be sufficient for cells to carry out some tasks, in which case a low channel capacity does not necessarily indicate a network not performing its task. Indeed, how much information is needed for a specific task is a related but distinct question from how much information is provided though a signaling network. Both questions are essential to understand a cell's signaling behavior, with the former being far less easy to answer in a way which is generalizable. In contrast, the latter can be quantitatively answered using the analysis presented in our manuscript.

      1. Related to (2), the authors should comment on why in Fig. 3A, I_Cee=3. Importantly, where does the fact that the network is able to distinguish between 23 ligand levels come from? Is this related to the choice (and binning) of the input ligand distribution (described in the SI)?

      We thank the reviewer for the comment. The network can distinguish between all inputs used in the in silico experiment precisely because the noise at the cellular level is small enough that there is negligible overlap between single cell response distributions. Indeed, the mutual information will not increase with the number of equally spaced inputs in a sub-linear manner, especially when the input number is very high.

      1. The authors should justify the choice of the gamma distribution in a number of cases (eg. distribution of ligand, distribution cell state parameters, such as number of receptors, receptor degradation rate, etc.).

      We thank the reviewer for the comment. We note that previous works in protein abundances and gene expression levels (e.g. see6) have reported distributions with positive skews that can be fit well with gamma distributions or log-normal distributions. Moreover, many stochastic models of protein abundance levels and signaling networks are also known to result in abundances that are distributed according to a negative binomial distribution, the discrete counterpart of gamma distribution. Therefore, we chose Gamma distributions in our study. We have now clarified this point in the Supplementary Information. At the same time, gamma distribution only serves as a regularization for the finite data and in principle, our analysis and conclusion do not depend on choice of gamma distribution for abundances of proteins, ligands, and cell parameters.

      1. Referring to SI Section 2, it is stated that the probability of the response (receptor binding occupancy) conditioned on the input ligand concentration and number of receptors is a Poisson distribution. Indeed this is nicely demonstrated in Fig. S2. Therefore it is the coefficient of variation (std/mean) that decreases with increasing R0, not the noise (which is strictly the standard deviation) as stated in the paper.

      We thank the reviewer of the comment. We have now corrected our text.

      1. In addition to explicitly stating what the input (IGF level) and the output (nuclear GFP-tagged FoxO level) are, it would be helpful if it is also stated what is the vector of state variables, theta, corresponding to the schematic diagram in Fig. 2C.

      We thank the reviewer of the comment. We have now corrected our text in the supplementary material as well as the main text (Figure 2 caption).

      1. Related to Fig. 2C, the statement in the caption: "Phosphorylated Akt leads to phosphorylation of FoxO which effectively shuttles it out of the nucleus." needs clarification: From the figure, it appears that pFoxO does not cross the nuclear membrane, in which case it would be less confusing to say that phosphorylation prevents reentry of FoxO into the nucleus.

      We thank the reviewer of the comment. We have now corrected our text (Figure 2 caption).

      1. The explanations for Fig. 2D, E and insets are sparse and therefore not clear. The authors should expand on what is meant by model and experimental I(theta). What is CC input dose? Also in Fig. 2E, the overlap between the blue and pink histograms means that the value of the blue histogram for the final bin - and therefore agreement or lack thereof with the experimental result - is not visible. Also, the significance of the values 3.25 bits and 3 bits in these plots should be discussed in connection with the input distributions.

      We thank the reviewer of the comment. We have now corrected our text (Figure 2 caption and lines 249-251).

      1. While the joint distribution of the cell state dependent mutual information and various biochemical parameters is given in Fig. S7, there is no explanation of what these results mean, either in the SI or main text. Related to this, while a central claim of the work is that establishing this joint distribution will allow determination of cell state variables that differentiate between high and low fidelity sensing, this claim would be stronger with more discussion of Figs. 3 and S7. The related central claim that cell state dependent mutual information leads to higher fidelity sensing at the population level would be made stronger if it can be demonstrated that in the limit of rapidly varying cell state variables, the I_CSA is retrieved.

      We thank the reviewer for this excellent comment. We have now added more discussion about interpreting the correlation between cell state variables and cell-state specific mutual information (lines 294-306). We also appreciate the suggestion about a toy model calculation to show that dynamics of cell state variables affects cell state specific mutual information. We have now performed a simple calculation to show how dynamics of cell state variables affects cells’ sensing ability (lines 325-363). Specifically, we constructed a model of a receptor binding to the ligand wherein the receptor levels themselves changed over time through a slow process of gene expression (Author response image 3, main text Figure 4). In this model, the timescales of fluctuations of ligand-free receptors on the cell surface can be tuned by speeding up/slowing down the degradation rate of the corresponding mRNA while keeping the total amount of steady state mRNA constant. As shown in Author response image 3, the dependence of cell-specific mutual information on cell state variable diminishes when the time scale of change of cell state variables is fast.

      Author response image 3.

      Author response image 3. Cell state dynamics governs cell state conditioned mutual information. A. In a simple stochastic model, receptor mRNA is produced at a constant rate from the DNA and the translated into ligand-free receptors. The number of ligand-bound receptors after a short exposure to ligands is considered the output. B. A schematic showing dynamics of receptor numbers when mRNA dynamics are slower compared to signaling time scales. C. Conditioning on receptor numbers leads to differing abilities in sensing the environment when the time scale of mRNA dynamics τ is slow. In contrast, when the mRNA dynamics are fast (large τ-1), conditioning on cell state variables does not lead to difference in sensing abilities.

      Reviewer #1 (Recommendations For The Authors):

      My major concerns are mainly conceptual, as described below. With proper attention to these concerns, I feel that this manuscript could be a good candidate for the eLife community.

      Major concerns:

      1. The manuscript convincingly demonstrates that cells good sensors after all, and that heterogeneity makes their input-output functions different from each other. This raises the question of what happens downstream of sensing. For single-celled organisms, where it may be natural to define behavioral consequences at the single-cell level, it may very well be relevant that single-cell information is high, even if cells respond differently to the environment. But for cells in multicellular organisms, like those studied here, I imagine that most behavioral consequences of sensing occur at the multicellular level. Thus, many cells' responses are combined into a larger response. Because their responses are different, their high-information individual responses may combine into a low-information collective response. In fact, one could argue that a decent indicator of the fidelity of this collective response is indeed the population-level information measure estimated in previous works. Thus, a fundamental question that the authors must address is: what is the ultimate utility of reliable, but heterogeneous, responses for a multicellular system? This question has an important bearing for the relevance of their findings.

      We thank the reviewer for this thought-provoking comment. We agree that the fidelity with which cells sense their environment, especially those in multicellular organisms, may not always need to be very high. We speculate that when the biological function of a collection of cells can be expressed as an average over the response of individual cells; high-information but heterogeneous cells can be considered equivalent to low-information homogeneous cells. An example of such a function is population differentiation to maintain relative proportions of different cell types in a tissue or producing a certain amount of extracellular enzyme.

      In contrast, we believe that when the biological function involves collective action, spatial patterning, or temporal memory, the difference between reliable but heterogeneous population and unreliable homogeneous population will become significant. We plan to explore this topic in future studies.

      1. The authors demonstrate that the agreement is good between their inference approach and the direct estimation of response distributions from single-cell time series data. In fact, the agreement is so good that it raises the question of why one would need the inference approach at all. Is it because single-cell time series data is not always available? Is that why the authors used it for one example and not the other? The validation is an asset, but I imagine that the inference approach is complicated and may make assumptions that are not always true. Thus, its utility and appropriate use must be clarified.

      We thank the reviewer for the comment. As the reviewer correctly pointed out, live cell imaging data is not always available and has limited scope. Specifically, optical resolution limits measurements of multiple targets. Moreover, typical live cell measurements measure total abundance or localization and not post-translational modification (phosphorylation, methylation, etc.) which are crucial to signaling dynamics. The most readily available single cell data such those measured using single cell RNA sequencing, immunofluorescence, or flow cytometry are necessarily snapshots. Therefore, computational models that can connect underlying signaling networks to snapshot data become essential when imputing single cell trajectories. In addition, the modeling also allows us to identify network parameters that correlate most strongly with cellular heterogeneity. We have now clarified this point in the manuscript (lines 366-380).

      Minor comments:

      1. I would point out that the maximum values in the single-cell mutual information distributions (Fig 2D and E) correspond to log2 of the number of inputs levels, corresponding to perfect distinguishability of each of the equally-weighted input states. It is clear that many of the mutual information values cluster toward this maximum, and it would help readers to point out why.

      We thank the reviewer for the comment. We have now included a discussion about the skew in the distribution in the text (lines 251-260).

      1. Line 216 references Fig 2C for the EGF/EGFR pathway, but Fig 2C shows the FoxO pathway. In fact, I did not see a schematic of the EGF/EGFR pathway. It may be helpful to include one, and for completeness perhaps also one for the toy model, and organize the figures accordingly.

      We thank the reviewer for the comment. We did not include three separate schematics because the schematics of the EGF/EGFR model and the toy model are subsets of the schematic of the IGF/FoxO model. We have now clarified this point in the manuscript (Figure 2 caption).

      Reviewer #2 (Recommendations For The Authors):

      • the simple model of Fig. 2A would gain from a small cartoon explaining the model and its parameters.

      We thank the reviewer for the comment. We did not include a schematic for the toy model as it is a subset of the schematic of the IGF/FoxO model. The schematic of the toy model is included in the supplementary information.

      • L should be called u, and B should be called x, to be consistent with the rest of the notations in the paper.

      We have decided to keep the notation originally presented in the manuscript.

      • legend of 2E and D should be clarified. "CC input dose" is cryptic. The x axis is the input dose, the y axis is its distribution at the argmax of I. CC is the max of I, not its argmax. Likewise "I" in the legend for the colors should not be used to describe the insets, which are input distributions.

      We have now changed this in the manuscript.

      • the data analysis of the IGF/FoxO pathway should be explained in the main text, not the SI. Otherwise it's impossible to understand how one arrives at, or how to intepret, figure 2E, which is central to the paper. For instance the fact that p(x|u,theta) is assumed to be Gaussian, and how the variance and mean are estimated from the actual data is very important to understand the significance of the results.

      While we have added more details in the manuscript in various places, for the sake of brevity and clarity, we have decided to keep the details of the calculations in the supplementary materials.

      • there's no Method's section. Most of the paper's theoretical work is hidden in the SI, while it should be described in the methods.

      We thank the review of the comment. However, we believe that adding a methods section will break the narrative of the paper. The methods are described in detail in the supplementary materials with sufficient detail to reproduce our results. Additionally, we also provide a link to the github page that has all scripts related to the manuscript.

      PS: please submit a PDF of the SI for review, so that people can read it on any platform (as opposed to a word document, especially with equations)

      We have now done this.

      Reviewer #3 (Recommendations For The Authors):

      1. Subplots in Fig. 1, inset in Fig. 3 are not legible due to small font.

      We have now increased the font.

      1. Mean absolute error in Fig. S5 and relative error in related text should be clarified.

      We have now clarified this in the manuscript.

      1. Acronyms (MACO, MERIDIAN) should be defined.

      We have now made these changes.

      References

      1. Gregor T, Tank DW, Wieschaus EF, Bialek W. Probing the limits to positional information. Cell. 2007;130(1):153-64. doi: 10.1016/j.cell.2007.05.025. PubMed PMID: WOS:000248587000018.

      2. Cohen-Saidon C, Cohen AA, Sigal A, Liron Y, Alon U. Dynamics and Variability of ERK2 Response to EGF in Individual Living Cells. Mol Cell. 2009;36(5):885-93. doi: 10.1016/j.molcel.2009.11.025. PubMed PMID: WOS:000272965400020.

      3. Gross SM, Dane MA, Bucher E, Heiser LM. Individual Cells Can Resolve Variations in Stimulus Intensity along the IGF-PI3K-AKT Signaling Axis. Cell Syst. 2019;9(6):580-8 e4.

      4. Loos C H, J. Mathematical modeling of variability in intracellular signaling. Current Opinion in Systems Biology. 2019;16:17-24.

      5. Dixit PD, Lyashenko E, Niepel M, Vitkup D. Maximum Entropy Framework for Predictive Inference of Cell Population Heterogeneity and Responses in Signaling Networks. Cell Syst. 2020;10(2):204-12 e8.

      6. Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, Emili A, Xie XS. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010;329(5991):533-8. doi: 10.1126/science.1188308. PubMed PMID: 20671182; PMCID: PMC2922915.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The proposed study provides an innovative framework for the identification of muscle synergies taking into account their task relevance. State-of-the-art techniques for extracting muscle interactions use unsupervised machine-learning algorithms applied to the envelopes of the electromyographic signals without taking into account the information related to the task being performed. In this work, the authors suggest including the task parameters in extracting muscle synergies using a network information framework previously proposed. This allows the identification of muscle interactions that are relevant, irrelevant, or redundant to the parameters of the task executed.

      The proposed framework is a powerful tool to understand and identify muscle interactions for specific task parameters and it may be used to improve man-machine interfaces for the control of prostheses and robotic exoskeletons.

      With respect to the network information framework recently published, this work added an important part to estimate the relevance of specific muscle interactions to the parameters of the task executed. However, the authors should better explain what is the added value of this contribution with respect to the previous one, also in terms of computational methods.

      It is not clear how the well-known phenomenon of cross-talk during the recording of electromyographic muscle activity may affect the performance of the proposed technique and how it may bias the overall outcomes of the framework.

      We thank reviewer 1 for their useful commentary on this manuscript.

      Reviewer #2 (Public Review):

      This paper is an attempt to extend or augment muscle synergy and motor primitive ideas with task measures. The authors idea is to use information metrics (mutual information, co-information) in 'synergy' constraint creation that includes task information directly. By using task related information and muscle information sources and then sparsification, the methods construct task relevant network communities among muscles, together with task redundant communities, and task irrelevant communities. This process of creating network communities may then constrain and help to guide subsequent synergy identification using the authors published sNM3F algorithm to detect spatial and temporal synergies.

      The revised paper is much clearer and examples are helpful in various ways. However, figure 2 as presented does not convincingly show why task muscle mutual information helps in separating synergies, though it is helpful in defining the various network communities used in the toy example.

      The impact of the information theoretic constraints developed as network communities on subsequent synergy separation are posited to be benign and to improve over other methods (e.g., NNMF). However, not fully addressed are the possible impacts of the methods on compositionality links with physiological bases, and the possibility remains of the methods sometimes instead leading to modules that represent more descriptive ML frameworks that may not support physiological work easily. Accordingly, there is a caveat. This is recognized and acknowledged by the authors in their rebuttal of the prior review. It will remain for other work to explore this issue, likely through testing on detailed high degree of freedom artificial neuromechanical models and tasks. This possible issue with the strategy here likely needs to be fully acknowledged in the paper.

      The approach of the methods seeks to identify task relevant coordinative couplings. This is a meta problem for more classical synergy analyses. Classical analyses seek compositional elements stable across tasks. These elements may then be explored in causal experiments and generative simulations of coupling and control strategies. However, task-based understanding of synergy roles and functional uses is significant and is clearly likely to be aided by methods in this study.

      Information based separation has been used in muscle synergy analyses using infomax ICA, which is information based at core. Though linear mixing of sources is assumed in ICA, minimized mutual information among source (synergy) drives is the basis of the separation and detects low variance synergy contributions (e.g., see Yang, Logan, Giszter, 2019). In the work in this paper, instead, mutual information approaches are used to cluster muscles and task features into network communities preceding the SNM3F algorithm use for separation, rather than using minimized information in separation. This contrast of an accretive or agglomerative mutual information strategy here used to cluster into networks, versus a minimizing mutual information source separation used in infomax ICA epitomizes a key difference in approach here.

      Physiological causal testing of synergy ideas is neglected in the literature reviews in the paper. Although these are only in animal work (Hart and Giszter, 2010; Takei and Seki, 2017), the clear connection of muscle synergy analysis choices to physiology is important, and eventually these issues need to be better managed and understood in relation to the new methods proposed here, even if not in this paper.

      Analyses of synergies using the methods the paper has proposed will likely be very much dependent on the number and quality of task variables included and how these are managed, and the impacts of these on the ensuing sparsification and network communities used prior to SNM3F. The authors acknowledge this in their response. This caveat should likely be made very explicit in the paper.

      It would be useful in the future to explore the approach described with a range of simulated data to better understand the caveats, and optimizations for best practices in this approach.

      A key component of the reviewers’ arguments here is their reductionist view of muscle synergies vs the emergentist view presented in our work here. In the reductionist lens, muscle groupings are the units (‘building blocks’) of coordinated movement and thus the space of intermuscular interactions is of particular interest for understanding movement construction. On the other hand, the emergentist view suggests that muscle groupings emerge from interactions between constituent parts (as quantified here using information theory, synergistic information is the information found when both activities are observed together). This is in line with recent work in the field showing modular control at the intramuscular level, exemplifying a scale-free phenomena. Nonetheless, we consider these approaches to muscle synergy research as complementary and beneficial for the field overall going forward.

      Reviewer #3 (Public Review):

      In this study, the authors developed and tested a novel framework for extracting muscle synergies. The approach aims at removing some limitations and constraints typical of previous approaches used in the field. In particular, the authors propose a mathematical formulation that removes constraints of linearity and couples the synergies to their motor outcome, supporting the concept of functional synergies and distinguishing the task-related performance related to each synergy. While some concepts behind this work were already introduced in recent work in the field, the methodology provided here encapsulates all these features in an original formulation providing a step forward with respect to the currently available algorithms. The authors also successfully demonstrated the applicability of their method to previously available datasets of multi-joint movements.

      Preliminary results positively support the scientific soundness of the presented approach and its potential. The added values of the method should be documented more in future work to understand how the presented formulation relates to previous approaches and what novel insights can be achieved in practical scenarios and confirm/exploit the potential of the theoretical findings.

      In their revision, the authors have implemented major revisions and improved their paper. The work was already of good quality and now it has improved further. The authors were able to successfully:

      • improve the clarity of the writing (e.g.: better explaining the rationale and the aims of the paper);

      • extend the clarification of some of the key novel concepts introduced in their work, like the redundant synergies;

      • show a scenario in which their approach might be useful for increasing the understanding of motor control in patients with respect to traditional algorithms such as NMF. In particular, their example illustrates why considering the task space is a fundamental step forward when extracting muscle synergies, improving the practical and physiological interpretation of the results.

      We thank reviewer 3 for their constructive commentary on this manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 3 should report the distances between reaching points in panel A and the actual length distances of the walking paths in panel C.

      The caption of fig.3 concerning the experimental setup of the datasets analysed has been updated with the following for dataset 1: “(A) Dataset 1 consisted of participants executing table-top point-to-point reaching movements (40cm distance from starting point P0) across four targets in forward (P1-P4) and backwards (P5-P8) directions at both fast and slow speeds (40 repetitions per task) [25]. The muscles recorded included the finger extensors (FE), brachioradialis (BR), biceps brachii (BI), medial-triceps (TM), lateral-triceps (TL), anterior deltoid (AD), posterior deltoid (PD), pectoralis major (PE), latissimus dorsi (LD) of the right, reaching arm.”. For dataset 3, to the best of the authors knowledge, this information was not given in the original paper.

      Figure 4, what is the unit of the data shown?

      The unit of bits is now mentioned in the toy example figure caption and in the caption of fig.5

      Figure 4, the characteristics of the interactions are not fully clear, and the graphical representation should be improved.

      We have made steps to improve the clarity of the figures presented.

      For dataset 3, τ was the movement kinematics, but it is not specified how the task parameters were formulated. Did the authors use the data from all 32 kinematic markers, 4 IMUs, and force plates? If yes, it should be specified why all these signals were used. For sure, there will be signals included that are not relevant to the specific task. Did the authors select specific signals based on their relevance to the task (e.g., ankle kinematics)?

      We have now clarified this in the text as follows: “For datasets 1 and 2, we determine the MI between vectors with respect to several discrete task parameters representing specific task attributes (e.g. reaching direction, speed etc.), while for dataset 3 we determined the task-relevant and -irrelevant muscles couplings in an unassuming way by quantifying them with respect to all available kinematic, dynamic and inertial motion unit (IMU) features.”

      How did the authors endure that crosstalk did not affect their analysis, particularly between, e.g., finger extensors and brachioradialis and posterior deltoid and anterior deltoid (dataset 1)?

      We have addressed this point in the previous round of reviews and made an explicit statement regarding cross-talk in the discussion section: “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [66], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [19,51].”

      It would be informative to add some examples of not trivial/obvious task-related synergistic muscle combinations that have been extracted in the three datasets. Most of the examples reported in the manuscript are well-known biomechanically and quite intuitive, so they do not improve our understanding of synergistic muscle control in humans.

      Our framework improves our understanding of synergistic motor control by enabling the formal quantification of synergistic muscle interactions, a capability not present among current approaches. Regarding the implications of this advance in terms of concrete examples, we have further clarified our examples presented in the results section, for example:

      “Across datasets, many the muscle networks could be characterised by the transmission of complementary task information between functionally specialised muscle groups, many of which identified among the task-redundant representations (Fig.9-10 and Supp. Fig.2). The most obvious example of this is the S3 synergist muscle network of dataset 2 (Fig.11), which captures the complementary interaction between task-redundant submodules identified previously (S3 (Fig.9)).”

      The description shows how our framework can extract the cross-module interactions that align with the higher-level objectives of the system, here the synergistic connectivity between the upper and lower body modules. Current approaches can only capture redundant and task-irrelevant interactions. Thus our framework provides additional insight into movement control.

      The number of participations in dataset 2 is very limited and should be increased. We appreciate the reviewer's comment and would like to point out that for dataset 2 our aim was to increase the number of muscles (30), tasks (72) and trials for each task (30) which produced a very large dataset for each participant. This came at the expense of low number of participants, however all our statistical analyses here can be performed at the single-participant level. Furthermore, dataset 3 includes 25 participants and it enables us to demonstrate the reliability of the findings across participants.

      Reviewer #2 (Recommendations For The Authors):

      I believe it is important in the future to explore the approach proposed with a range of simulation data and neuromechanical models, to explore the issues I have raised and that you have acknowledged, though I agree it is likely out of scope for the paper here.

      We agree with the reviewer that this would be valuable future work and indeed plan to do this in our future research.

      The Github code for this paper should likely include the various data sets used in the paper and figures, appropriately anonymized, in order to allow the data to be explored and analyses replicated and package demonstrated to be exercised fully by a new user.

      We thank the reviewer for this suggestion. Dataset3 is already available online at https://doi.org/10.1016/j.jbiomech.2021.110320. We will also make the other 2 datasets publicly available on our lab website very soon. Until then, as stated in the manuscript, we will make them available to anyone upon reasonable request.

      Reviewer #3 (Recommendations For The Authors):

      I have the following open points to suggest to the authors:

      First, I recommend improving the quality of the figures: in the pdf version I downloaded, some writings are impossible to read.

      We fully agree with the reviewer and note that in the pdf version of the paper, the figures are a lot worse than in the submitted word document submitted. Nevertheless, we will make further improvements on the figures as requested.

      Even though the manuscript has improved, I still feel that some points were not addressed or were only partially addressed. In particular:

      • The proposed comparison with NMF helps understanding why incorporating the task space is useful (and I fully agree with the authors about this point as the main reason to propose their contribution). However, the comparison does not help the reader to understand whether the synergies incorporating the task space are biased by the introduction of the task variables.

      This question can be also reformulated as: are muscle synergies modified when task space variables are incorporated? Is the "weight" on task coefficients affecting the composition of muscle synergies? If so, the added interpretational power is achieved at the cost of losing the information regarding the neural substrate of synergies? I understand this point is not immediate to show, but it would increase the quality of the work.

      • Reference to previous approaches that aimed at including task variables into synergy extraction are still missing in the paper. Even though it is not required to provide quantitative comparisons with other available approaches, there are at most 2-3 available algorithms in the literature (kinematics-EMG; force-EMG), that should not be neglected in this work. What did previous approaches achieve? What was improved with this approach? What was not improved?

      Previous attempts of extracting synergies with non-linear approaches could also be described more.

      In the latest version of the manuscript, we have referenced both the mixed NMF and autoencoders based algorithms. In both the introduction and discussion section of the manuscript, we also specify that our framework quantifies and decomposes muscle interactions in a novel way that cannot be done by other current approaches. In the results section we use examples from 3 different datasets to make this point clear, providing intuition on the use cases of our framework.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to review.

      We thank the editors and reviewers for their time in assessing our manuscript. We changed the title to remove the word “all” because we realized that was hyperbolic. Corrections in response to review are in blue text throughout the manuscript document (other minor corrections are not highlighted).

      eLife assessment

      This study presents valuable insights into the evolution of the gasdermin family, making a strong case that a GSDMA-like gasdermin was already present in early land vertebrates and was activated by caspase-1 cleavage. Convincing biochemical evidence is provided that extant avian, reptile, and amphibian GSDMA proteins can still be activated by caspase-1 and upon cleavage induce pyroptosis-like cell death - at least in human cell lines. The caspase-1 cleavage site is only lost in mammals, which use the more recently evolved GSDMD as a caspase-1 cleavable pyroptosis inducer. The presented work will be of considerable interest to scientists working on the evolution of cell death pathways, or on cell death regulation in non-mammalian vertebrates.

      We thank the editor for their time in evaluating our manuscript. We agree with the eLife assessment and with the comments of the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors start out by doing a time-calibrated gene/species tree analysis of the animal gasdermin family, resulting in a dendrogram showing the relationship of the individual gasdermin subfamilies and suggesting a series of gene duplication events (and gene losses) that lead to the gasdermin distribution in extant species. They observe that the GSDMA proteins from birds, reptiles, and amphibians do not form a clade with the mammalian GSDMAs and notice that the non-mammalian GSDMA proteins share a conserved caspase-1 cleavage motif at the predicted activation site. The authors provide several series of experiments showing that the non-mammalian GSDMA proteins can indeed be activated by caspase-1 and that this activation leads to cell death (in human cells). They also investigate the role of the caspase-1 recognition tetrapeptide for cleavage by caspase-1 and for the pathogen-derived protease SpeB.

      We thank the reviewer for their time in evaluating our manuscript.

      Strengths:

      The evolutionary analysis performed in this manuscript appears to use a broader data basis than what has been used in other published work. An interesting result of this analysis is the suggestion that GSDMA is evolutionarily older than the main mammalian pyroptotic GSDMD, and that birds, reptiles, and amphibians lack GSDMD but use GSDMA for the same purpose. The consequence that bird GSDMA should be activated by an inflammatory caspase (=caspase1) is convincingly supported by the experiments provided in the manuscript.

      We thank the reviewer for their assessment of the manuscript.

      Weaknesses:

      1. As a non-expert in phylogenetic tree reconstruction, I find the tree resulting from the authors' analysis surprising (in particular the polyphyly of GSDMA) and at odds with several other published trees of this family. The differences might be due to differences in the data being used or due to the tree construction method, but no explanation for this discrepancy is provided.

      We agree, and we have modified the text to add more context to explain why our analysis generated a different topology: “In comparison to previously published studies, we used different methods to construct our gasdermin phylogenetic tree, with the result that our tree has a different topology. The topology of our tree is likely to be affected by our increased sampling of gasdermin sequences; we included 1,256 gasdermin sequences in comparison to 300 or 97 sequences used in prior studies. Prior studies used maximum likelihood tree building techniques, whereas we used a more computationally intensive Bayesian method using BEAST with strict molecular clocks that allows us to provide divergence time estimates, which we calibrated using mammal fossil estimated ages. We think that this substantially increased sampling paired with time calibration allow us to produce a more accurate phylogeny of the gasdermin protein family.”

      To explain and further support our method in a more technical manner, in our phylogenetic tree, non-mammal GSDMAs are paralogous to mammals GSDMAs whereas others have found that non-mammal GSDMAs are orthologous to mammal GSDMAs. We obtained moderate support for the non-mammal GSDMA placement with Bayesian posterior 0.42 and with maximum likelihood bootstrap support of 0.96. Angosto-Bazarra et al. has for their placement a Bayesian posterior of 0.66 and maximum likelihood bootstrap support of 0.98. These are good results, but they arise from significantly fewer sequences than are included in our tree. However, in Fig S2 of Angosto-Bazarra et al. the support drops to 0.08. That the posteriors in both are not 1 indicate the presence of phylogenetic conflicts (i.e., a significant fraction of alternative trees), which means that the tree of our study or Angosto-Bazarra could be incorrect. That said, our tree is supported by biological support, and our dataset is substantially larger. To better characterize this node, further sampling with even more species would be required. We exhausted the current available sequences at the time our tree was generated.

      Differences between our study and previous studies:

      Author response table 1.

      1. While the cleavability of bird/reptile GSDMA by caspase-1 is well-supported by several experiments, the role of this cleavage for pyroptotic cell killing is addressed more superficially. One cell viability assay upon overexpression of GSDMA-NTD in human HEK293 cells is shown and one micrograph shows pyroptotic morphology upon expression in HeLa cells. It is not clear why these experiments were limited to human cells…

      We did include one more experiment in human cells which is Figure 4B, in which we express full length chicken GSDMA with dimerizable caspase-1, and show that LDH release requires the cleavage site aspartate, D244. That said, we agree that our use of only human cell lines is a weakness of the paper. We thought that the best way to definitively show the interaction of caspase-1 and GSDMA was to perform experiments in chicken macrophages. Therefore, we generated a custom-raised anti-chicken-GSDMA antibody. Unfortunately, the quality of the antibody was insufficient to detect endogenous GSDMA in chicken bone marrow-derived macrophages. Off target binding prevented the observation of chicken GSDMA bands. We added a section to the discussion acknowledge the need for further studies: “In future studies, the association of bird/amphibian/reptile GSDMA and caspase-1 should be confirmed in native cells from each of these animals.”

      …and why two different cell types were used for the two complementary results.

      In the paper we used 293T cells and HeLa cells as generic cell types that have distinct benefits. In general, we used 293T/17 cells for experiments where high transfection efficiency was most critical, as it is simple to achieve 90% or higher transfection efficiency in this line. However, 293T/17s have poor spreading in culture and thus are not as useful for morphologic studies. 293T/17 cells do display pyroptotic ballooning upon gasdermin activation, however, the images are less pronounced in comparison to other cell types that have more distinct morphology. Therefore, we used HeLa cells for the microscopy experiments because they are more adherent and larger than 293T/17s which make for easier visualization of pyroptotic ballooning. We have added the following statement to the text to make our rationale for the use of different cell line more apparent: “In these experiments, 293T/17s were used for their high transfection efficiency, and HeLas were used for microscopy studies for their larger size and improved adherence.”

      1. The introduction mentions as a motivation for this work our lack of knowledge of how human GSDMA is activated. This is indeed an interesting and pressing question, but it is not really addressed in the manuscript. This is particularly true when believing the authors' dendrogram results that the bird and mammalian GSDMA families do not form a clade.

      As a consequence, the significance of this finding is mostly limited to birds and reptiles.

      Our aspirations were to discover hidden facets of mammal GSDMA by using a molecular evolutionary analysis. bird/amphibian/reptile GSDMA. Although we did not learn the identity of a host protease that activates mammalian GSDMA, we serendipitously discovered the evolutionary history of the association of caspase-1 with the gasdermin family. We think this manuscript provides an important and interesting advance in the field to reveal the process of evolution at work in the gasdermin family, and that the association of caspase-1 with a gasdermin to cause pyroptosis is an unbroken pairing throughout evolution. It is surprising to us that the specific gasdermin partner has changed over time.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the molecular evolution of members of the gasdermin (GSDM) family. By adding the evolutionary time axis of animals, they created a new molecular phylogenetic tree different from previous ones. The analyzed result verified that non-mammalian GSDMAs and mammalian GSDMAs have diverged into completely different and separate clades. Furthermore, by biochemical analyses, the authors demonstrated non-mammalian GSDMA proteins are cleaved by the host-encoded caspase-1. They also showed mammalian GSDMAs have lost the cleavage site recognized by caspase-1. Instead, the authors proposed that the newly appeared GSDMD is now cleaved by caspase-1.

      We thank the reviewer for their time in evaluating our manuscript.

      Through this study, we have been able to understand the changes in the molecular evolution of GSDMs, and by presenting the cleavage of GSDMAs through biochemical experiments, we have become able to grasp the comprehensive picture of this family of molecules. However, there are some parts where explanations are insufficient, so supplementary explanations and experiments seem to be necessary.

      Strengths:

      It has a strong impact in advancing ideas into the study of pyroptotic cell death and even inflammatory responses involving caspase-1.

      We thank the reviewer for the critical consideration of the phylogeny presented.

      Weaknesses:

      Based on the position of mammalian GSDMA shown in the molecular phylogenetic tree (Figure 1), it may be difficult to completely agree with the authors' explanation of the evolution of GSDMA.

      1. Focusing on mammalian GSDMA, this group, and mammalian GSDMD diverged into two clades, and before that, GSDMA/D groups and mammalian GSDMC separated into two, more before that, GSDMB, and further before that, non-mammalian GSDMA, when we checked Figure 1. In the molecular phylogenetic tree, it is impossible that GSDMA appears during evolution again. Mammalian GSDMAs are clearly paralogous molecules to non-mammalian GSDMAs in the figure. If they are bona fide orthologous, the mammalian GSDMA group should show a sub-clade in the non-mammalian GSDMA clade. It is better to describe the plausibility of the divergence in the molecular evolution of mammalian GSDMA in the Discussion section.

      We appreciate the reviewer’s careful consideration of our phylogeny. We agree that we did not make this clear enough in the discussion. Indeed, this is a confusing point, and is a critical concept in the paper. This is among our most important findings, so we have added a line addressing this finding to the abstract. We think about these concepts starting from the oldest common ancestor of a group, and then think about how genes duplicate over time. To the discussion we now begin with the following:

      We discovered that GSDMA in amphibians birds and reptiles are paralogs to mammal GSDMA. Surprisingly, the GSDMA genes in both the amphibians/reptiles/birds and mammal groups appear in the exact same locus. Therefore, this GSDMA gene was present in the common ancestor of all these animals. In mammals, this GSDMA duplicated to form GSDMB and GSDMC. Finally, a new gene duplicate, GSDMD, arose in a different chromosomal location. Then this GSDMD gene became a superior target for caspase-1 after developing the exosite. Once GSDMD had evolved, we speculate that the mammalian GSDMA became a pseudogene that was available to evolve a new function. This new function included a new promoter to express mammalian GSDMA primarily in the skin, and perhaps acquisition of a new host protease that has yet to be discovered.

      In further support of the topology of our Bayesian tree in Figure 1, we also performed a maximum likelihood analysis, which also placed the GSDMA genes into similarly distinct clades (Figure 1-S3). Finally, we have biological evidence to support this reasoning, where caspase-1 cleaves non-mammal GSDMAs and also mammal GSDMD (and no longer can cleave mammal GSDMA).

      1. Regarding (1), it is recommended that the authors reconsider the validity of estimates of divergence dates by focusing on mammalian species divergence. Because the validity of this estimation requires a recheck of the molecular phylogenetic tree, including alignment.

      Our reconstructed evolution of gasdermins is consistent with the mammal tree of life. We constrained Bayesian estimation of divergences using soft calibrations from mammal fossil estimated ages. We have included the fossil calibration of mammalian gasdermins to the results section and to our methods.

      1. If GSDMB and/or GSDMC between non-mammalian GSDMA and mammalian GSDMD as shown in the molecular phylogenetic tree would be cleaved by caspase-1, the story of this study becomes clearer. The authors should try that possibility.

      It is known that mammal GSDMB and GSDMC cannot be activated by caspase-1. We propose that GSDMA was cleaved by caspase-1 only in extinct mammals that had not yet associated GSDMD with caspase-1. Such an extinct mammal could have encoded a GSDMA cleaved by caspase-1, a GSDMB cleaved by granzyme A, and GDSMC cleaved by caspase-8. Later, the GSDMA gene was again duplicated to form GSDMD. After GSDMD was targeted by caspase-1, then GSDMA was free to gain its current function in barrier tissues.

      Reviewer #1 (Recommendations For The Authors):

      As a non-expert on phylogenetic tree construction, I found the "time-calibrated maximum clade credibility coalescent tree" hard to digest. I would have liked to see an explanation of how this method is different from what has been used before and why the authors consider it to be better. This is particularly important when considering that the resulting tree shown in Figure 1 is quite different from other published trees of the same family (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8742441 where the GSDMA family appears monophyletic).

      Please see response to Reviewer 1 weaknesses above. Also, we have moved the text “time-calibrated maximum clade credibility coalescent tree” to the figure legend.

      In the bioinformatical analysis of the conserved caspase-1 cleavage motif in bird GSDMA sequences, I would recommend also addressing the residue behind the cleavage site Asp, as this position has an unusually high conservation (mostly Gly) in bird GSDMA.

      This is a great observation. We suspect that this may reflect a need for flexibility in the secondary structure to allow the cleavage site to enter the enzymatic pocket of the caspase. This residue is also similarly enriched in mammal GSDMD, which is also cleaved by caspase-1. We also note high conservation of a P2' proline residue in birds with the FASD tetrapeptide, which could also be important for displaying the tetrapeptide to the caspase.

      This comment prompted us to search the literature for evidence of these residues in caspase-1 substrate preference studies. Remarkably, a P1' glycine and P2` proline are among the most enriched residues in human caspase-1 targets. This supports our hypothesis that caspase-1 cleaves GSDMA in non-mammals. We added the following to the results section: “Additionally, the P1' residue in amphibian, bird and reptile GSDMA was often a glycine, and the P2' residue was often a proline, especially in birds with FASD/FVSD tetrapeptides (Fig. 2B). A small P1' residue is preferred by all caspases. By using a peptide library, glycine has been determined to be the optimal P1' residue for caspase-1 and caspase-4. Further, in a review of the natural substrates of caspase-1, glycine was the second most common P1' residue, and proline was the most common P2' residue. These preferences were not observed for caspase-9.”

      Finally, I would like the authors to at least explain why the cell viability assays were done in 293T cells while the micrographs were done in HeLa cells. Why not show both experiments for both cell types?

      In the paper we used 293T cells and HeLa cells as generic cell types that have distinct benefits. In general, we used 293T/17 cells for experiments where high transfection efficiency was most critical, as it is simple to achieve 90% or higher transfection efficiency in this line. However, 293T cells have poor spreading in culture and thus are not as useful for morphologic studies. 293T/17 cells do display pyroptotic ballooning upon gasdermin activation, however, the images are less pronounced in comparison to other cell types that have more distinct morphology. Therefore, we used HeLa cells for the microscopy experiments because they are more adherent and larger than 293T/17s which make for easier visualization of pyroptotic ballooning. We have added the following statement to the text to make our rationale for the use of different cell line more apparent: “In these experiments, 293T/17s were used for their high transfection efficiency, and HeLas were used for microscopy studies for their larger size and improved adherence.”

      There are a number of minor points related to language and presentation:

      • the expressions "pathogens contaminate the cytosol", "mammals can encode..", "an outsized effect" are unusual and might be rephrased.

      We changed these to:

      “manipulate the host cell, sometimes contaminating the cytosol with pathogen associated molecular patterns, or disrupting aspects of normal cell physiology”,

      “Only mammals encode GSDMC and GSDMD alongside the other four gasdermins.”,

      and

      “greater effect”

      • in line 87 the abbreviation "GSDMEc" is first used without explanation (of the "c").

      This is an important distinction, as GSDMEc proteins were only recently uncovered. To remedy this, we have added the following text following line 87: “This gasdermin was recently identified as an ortholog of GSDMA.

      It was called GSDMEc, following the nomenclature of other duplications of GSDME in bony fish that have been named GSDMEa and GSDMEb.”

      • line 89 grammar problem.

      Corrected

      • line 186ff the sentence "We believe..." does not appear to make sense.

      We revised the text to make this clear, changing the text to now read “We hypothesized that activating pyroptosis using separate gasdermins for caspase-1 and caspase-3 is a useful adaptation and allows for fine-tuning of these separate pathways. In mammals, this separation depends on the activation of GSDMD by caspase-1 and the activation of GSDME by caspase-3.”

      • many figures use pictures rather than text to represent species groups. These pictures are not always intuitive. As an example, in Figure 6 the 'snake' represents amphibians. After reading the text, I understand that these should probably be the caecilian amphibians, but not every reader might know what these critters look like. In Figure 7, I have no idea what the black blob (2nd image from top) is supposed to be.

      In crafting the manuscript, we found the use of text to denote the various species to be cumbersome. The species silhouettes are a standard graphical depiction used in evolutionary biology, which we think aids readability to the figures. For example, in a paper cited in our manuscript, these same silhouettes were used to depict the evolution of GSDMs (https://doi.org/10.3389/fcell.2022.952015 Figure 1A, Figure 3D, Figure 4G). However, we agree that many readers will not know that caecilians are legless amphibians that resemble snakes in their body morphology, but are not close to snakes by phylogeny. We think it is important to use an image of a caecilian amphibian because the more iconic amphibians (frogs, salamanders) do not encode GSDMA. To increase clarity, we have mentioned the morphology of caecilians in the legend of Figure 2, Figure 6, and Figure 7 when caecilican amphibians are first introduced.

      In Figure 2: “Note, that caecilians morphologically are similar to snakes in their lack of legs and elongated body, however, this is an example of convergent evolution as caecilians are amphibians and are thus more closely related to frogs and salamanders than snakes.”

      In Figure 6: “M. unicolor is an amphibian despite sharing morphological similarity to a snake.”

      In Figure 7: “In caecilian amphibians, which are morphologically similar to snakes, birds, and reptiles, GSDMA is cleaved by caspase-1.”

      The black blob is the mollusk Lingula anatina, which unfortunately has an indistinct silhouette. To clarify this, we have added text to label the images in Figure 7.

      Reviewer #2 (Recommendations For The Authors):

      1. Line 214, in "(Fig. 3-S2) Human and mouse ..", it is necessary to type a period.

      2. Line 238, in the subtitle, GSMA should be amended to GSDMA.

      These have both been corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful, critical, and insightful evaluation of our manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The preprint by Laganowsky and co-workers describes the use of mutant cycles to dissect the thermodynamic profile of specific lipid recognition by the ABC transporter MsbA. The authors use native mass spectrometry with a variable temperature source to monitor lipid binding to the native protein dimer solubilized in detergent. Analysis of the peak intensities (that is, relative abundance) of 1-3 bound lipids as a function of solution temperature and lipid concentration yields temperature-dependent Kds. The authors use these to then generate van't Hoff plots, from which they calculate the enthalpy and entropy contributions to binding of one, two, and in some cases, three lipids to MsbA.

      The authors then employ mutant cycles, in which basic residues involved in headgroup binding are mutated to alanine. By comparing the thermodynamic signatures of single and double (and in one instance triple) mutants, they aim to identify cooperativity between the different positions. They furthermore use inward and outward locking conditions which should control access to the different binding sites determined previously.

      The main conclusion is that lipid binding to MsbA is driven mainly by energetically favorable entropy increase upon binding, which stems from the release of ordered water molecules that normally coordinate the basic residues, which helps to overcome the enthalpic barrier of lipid binding. The authors also report an increase in lipid binding at higher temperatures which they attribute to a non-uniform heat capacity of the protein. Although they find that most residue pairs display some degree of cooperativity, particularly between the inner and outer lipid binding sites, they do not provide a structural interpretation of these results.

      Strengths:

      The use of double mutant cycles and mass spectrometry to dissect lipid binding is novel and interesting. For example, the observation that mutating a basic residue in the inner and one in the outer binding site abolishes lipid binding to a greater extent than the individual mutations is highly informative even without having to break it down into thermodynamic terms (see "weaknesses" section). In this sense, the method and data reported here opens new avenues for the structure/activity relationship of MsbA. The "mutant cycle" approach is in principle widely applicable to other membrane proteins with complex lipid interactions.

      Weaknesses:

      The use of double mutant cycles to dissect binding energies is well-established, and has, as the authors point out, been employed in combination with mass spectrometry to study protein-protein interactions. Its application to extract thermodynamic parameters is robust in cases where a single binding event is monitored, e.g. the formation of a complex with well-defined stoichiometry, where dissociation constants can be determined with high confidence. It is, however, complicated significantly by the fact that for MsbA-lipid interactions, we are not looking at a single binding event, but a stochastic distribution of lipids across different sites. Even if the protein is locked in a specific conformation, the observation of a single lipid adduct does not guarantee that the one lipid is always bound to a specific site. In some of the complexes detected by MS, the lipid is likely bound somewhere else. Lipid binding Kds from mass spectrometry, although helpful in some instances as a proxy for global binding affinities, should therefore be taken with a grain of salt.

      We agree with the reviewer in that while we will measure binding of lipid (mass shift) we do not know the binding location(s). Given this issue, we have added to the discussion section on this important point and elaborate more broadly on this problem in the context of membrane protein-lipid interactions. Tackling this issue represents a frontier challenge for the field.

      The authors analyze the difference in binding upon mutating binding sites (ddG etc). Here, another complicating factor comes into play, the fact that mutation of a binding site (which the authors show reduces lipid binding) may instead allow the lipid to bind to a lower-affinity site elsewhere. Unfortunately, the authors do not specify the protein concentration, but assuming it is in the single-digit micromolar range, as common for native MS experiments, lipid and protein concentrations are almost equal for most of the data points, resulting in competition between binding sites for free lipids. As a rule of thumb, for Kd measurements, the concentration of the constant component, the protein, should be far below the Kd, to avoid working in the "titration" regime rather than the "binding" regime (see Jarmoskaite et al, eLife 2020). I cannot determine whether this is the case here. The way I understand the double mutant cycle approach, reliable Kd measurements are required to accurately determine dH and TdS, so I would encourage the authors to confirm their Kd values using complementary methods before in-depth interpretations of the thermodynamic components.

      The reviewer references an article in eLife by Jarmoskaite and co-workers describing “titration” vs “binding” regimes. Below we paste a snippet from this article:

      Author response image 1.

      Equation 4a is an expression for the fraction of protein bound to ligand, which universally holds, i.e., if we know the concentration of molecules at equilibrium (including those unbound or free) then one can obtain the special ratio or equilibrium constant at a given temperature. Jarmoskaite et al. note that in practice (using traditional biophysical approaches) one cannot readily distinguish protein that is free or bound to ligand (see highlighted part above). While this assumption is basis of their eLife assessment, it does NOT apply to native mass spectrometry data. It is important to realize that the mole fraction (or concentration) of apo and each lipid bound states, i.e., [P], [PL], [PL2], …, [PLn+1], can readily be obtained directly from the deconvoluted mass spectrum. This is unlike other biophysical methods that are ensemble measurements, which measures the amount of heat or fraction of total ligand bound to protein. Since we can discern each lipid bound state, including the free protein and free ligand concentrations, the equilibrium binding constants can be directly calculated, and the protein and ligand concentration becomes irrelevant. In principle, equilibrium constants for protein-lipid interactions can be calculated from one mass spectrum. To increase transparency, we have updated the results section to highlight the important difference of the native MS approach compared to less robust traditional approaches that are riddled with underlying issues/assumptions.

      We appreciated the reviewer’s suggestion of using complementary methods to confirm Kd values. In our previous report [1], we determined binding thermodynamics for soluble protein-ligand interactions using native MS, surface plasmon resonance (SPR), and isothermal calorimetry (ITC) and found the techniques yield similar binding constants and thermodynamic parameters. The use of soluble proteins with defined ligand binding studies was rather straightforward to carry out a complementary study. We have also shown consistent findings for native MS and SPR of membrane protein interaction with a soluble, regulatory protein [2]. However, in the case of membrane proteins they can bind the first few lipids very specifically and, with the addition of more lipid, bind even more lipids that represent rather weak binding. Thus, traditional approaches would report on the ensemble of lipids bound to membranes and specific lipid binding sites (such as inner and outer LPS binding sites in MsbA) are saturable but also additional binding will be observed, i.e., doesn’t follow traditional soluble protein-ligand binding studies. In the past we have used a fluorescent-lipid competition binding assay [3] to corroborate native MS results for Kir3.2, which showed a direct correlation. The disadvantage of this complementary approach is using a non-natural, fluorescent-modified lipid. Unfortunately, there is no commercial source for a fluorophore modified KDL.

      It is somewhat counterintuitive that for many double mutants, and the triple mutant, the entropic component becomes more favorable compared to the WT protein. If the increase in entropy upon lipid binding comes from the release of ordered water molecules around the basic residues (a reasonable assumption) why does this apply even more in proteins where several basic residues have been changed to alanine, which coordinate far fewer water molecules?

      There are many factors that contribute to the change in entropy of the system, beyond solvation entropy, and deciphering the entropic contributions of the various components remains a challenging task. We have revised the manuscript to emphasize that solvation is one component of the entropic term and other components are likely at play.

      The authors could devote more attention to the fact that they use detergent micelles as a vehicle for lipid binding studies. To a limited extent, detergents compete with lipids for binding, and are present in extreme excess over the lipid. The micelle likely changes its behavior in response to temperature changes. For example, the packing around the protein loosens up upon heating, which may increase the chance for lipids to bind. In this case, the increase in binding at higher temperatures may not be related to a change in heat capacity. This question could be addressed by MD simulations, if it's not already in the literature.

      The detergent and its concentration are consistent for all the different MsbA proteins in this study. In fact, we observe linear van’t Hoff plots with positive and negative slopes as well as non-linear curves that are convex or concave. The MsbA protein (wt or mutant), trapped or not, all display unique temperature-dependent responses. The reviewers comment of increasing temperature to loosen packing of detergent to promote lipid binding is clearly NOT that simple. If detergent was significantly influencing lipid binding (as suggested by reviewer) then increasing its concentration should impact lipid binding. In a previous study, we found no difference in membrane protein-lipid thermodynamics even when the concentration of detergent was increased five-fold [1]. We repeated similar experiments for MsbA and find the increased detergent concentration does not impact the abundances of lipid bound states. The figure to the right shows MsbA in the presence of lipid in 2x CMC (panel a and b) and 10x CMC (panel c and d). As you will see, no appreciably difference in the lipid bound signal is observed.

      Author response image 2.

      We applaud the suggestion of MD simulation. However, it is far beyond the scope of this paper and its not clear what will really be learned.

      Reviewer #2 (Public Review):

      Summary:

      This is a solid study that dissects the thermodynamics of lipopolysaccharide (LPS) transporter MsbA and LPS. Native ESI-MS and the novel strategies developed by the authors were employed to quantify the affinities of LPS-MsbA interactions and its temperature dependence. Here, the equilibrium of lipid-protein interactions occurs in the micellar phase. The double-/triple-mutant cycle analysis and van't Hoff analysis allowed a full thermodynamic description of the lipid-protein interactions and the analysis of thermodynamic coupling between LPS binding sites. The most notable result would be that LPS-MsbA interaction is largely driven by entropy involving the negative heat capacity, a signature of the solvent reorganization effect (here authors attribute the solvent effect to "water" reorganization). The entropy driven lipid binding has been previously reported by the same authors for Kir1,2-PIP2 interactions.

      Strengths:

      1. This is overall a very thorough and rigorous study providing the detailed thermodynamic principles of LPS-MsbA interaction.

      2. The double and triple-mutant cycle approaches are newly applied to lipid-protein interactions, enabling detailed thermodynamics between LPS binding sites.

      3. The entropy-driven protein-lipid interaction is surprising. The binding seems to be mainly mediated by the electrostatic interaction between the positively charged residues on the protein and the negatively charged or polar headgroup of LPS, which could be thought of as "enthalpic" (making of a strong bond relative to that with solvent).

      Weaknesses:

      1. This study is a good contribution to the field, but it was difficult to find novel biological insights or methodological novelty from this study.

      1a. Thermodynamic analysis of lipid-protein interactions, an example of entropy-driven lipid-protein interactions, and the cooperativity between lipid binding sites have been reported by the author's group. Also, the cooperativity between binding sites in general have been reported from numerous studies of biomolecular interactions.

      We appreciate the reviewer for highlighting our previous work. Of course, a single study does not establish a pattern, such as entropy-driven lipid-protein interactions.

      While we agree with the reviewer that cooperativity in biomolecular interactions has been established for many soluble protein systems, by no means do we have a detailed understanding of membrane protein-lipid interactions. This work is an important contribution to expanding on classical work on soluble protein systems to more challenging membrane protein systems and their interactions with lipids.

      1b. It is not clear how this study provides new insights into the understanding of LPS transport mechanisms. Probably, authors could strengthen the Discussion by providing biological insights-how the residue coupling.

      The thermodynamics provides us with a deeper insight into the chemical principles that drive specific membrane protein-lipid interactions. We have revised the discussion to highlight the importance of thermodynamics and the implication of individual residues to KDL binding, and the inner and outer LPS binding sites appear to be coupled, something that is new.

      1. One to three LPS molecules bind to MsbA, but it is unclear whether bound KDL occupies inner or outer cavities, or both and how a specific mutation affects the affinity of specific LPS (i.e., to inner or to outer cavities). Based on the known structures, the maximal number of LPS is three. It is possible that the inner and outer cavities have different LPS affinities. Also, there can be multiple one-LPS-bound states, two-LPS-bound states if LPS strictly binds to the binding sites indicated by the structures. This aspect is beyond the scope of this study and difficult to address, but without this information, it seems hard to tell what is going on in the system.

      In our response above, we note that lipids will bind to membrane proteins at specific site(s) and weaker sites, often described as non-annular lipids. The revision includes this discussion point.

      1. If a single mutation is introduced to the inner cavity, its effect will be "doubled" because the inner cavity is shared by two identical subunits. This effect needs to be clarified in the result section.

      Great point. In addition, an outer mutant will also impact not one but both outer binding site(s)s. The revised manuscript makes note of this point.

      1. In the result section, "Mutant cycle analysis of KDL binding to vanadate-trapped MsbA.":

      4a. It seems necessary to show the mass spectra for Msb-ADP-vanadate complex as well as its lipid bound forms.

      In the original submission, the mass spectra of vanadate trapped MsbA with KDL binding was provided in Supplementary Figures 10 and 11.

      4b. The rationale of this section (i.e., what mechanistic insights can be obtained from this study) is unclear. For example, it is not sure what meaningful information can be obtained from a single type (ADP/vanadate) of the bound state regarding the ATP-driven function of MsbA.

      MsbA is a dynamic, populates different conformations. Trapping with vanadate locks the transporter in an outwardfacing state with NDB interacting. This provides the opportunity to characterize binding to the exterior site. We revised the manuscript to note this point.

      Reviewer #3 (Public Review):

      Summary:

      In this paper presented by Liu et al, native MS on the lipid A transporter MsbA was used to obtain thermodynamic insight into protein-lipid interactions. By performing the analyses at different lipid A concentrations and temperatures, dissociation constants for 2-3 lipid A binding sites were determined, as well as enthalpies were calculated using nonlinear van't Hoff fitting. Changes in free Gibb's energies were then calculated based on the determined dissociation constants, and together with the enthalpy values obtained via van' t Hoff analysis, the entropic contribution to lipid binding (DeltaS*T) was indirectly determined.

      Strengths:

      This is an extensive high quality native MS dataset that provides unique opportunities to gain insights into the thermodynamic parameters underlying lipid A binding. In addition, it provides coupling energies between mutations introduced into MsbA, that are implicated in lipid A binding.

      Weaknesses:

      The data all rely on the accuracy of determining KD values for lipid binding to MsbA. For the weaker binding sites, the range of lipid concentrations probed were in fact too low to generate highly accurate data. Another weakness is a lack of clear evidence, which KD values belong to which of the possible lipid A binding sites.

      See our detailed response to reviewer 1 regarding Kd determination using native MS compared to other techniques. We chose to focus on the first three lipid binding events and adjusted the concentrations accordingly to titrate these three. As noted above, the Kd values can be determined from one mass spectrum. For rigor, we include different titration points and fit sequential binding model to the data – the fits are shown in supplemental and quite reasonable.

      Regarding multiple lipids binding to different site(s), we have been able to distinguish high-affinity vs low-affinity PIP binding to Kir3.2 in a previous study [4]. This was apparent by the mole fraction curves for some lipid bound states not returning back to zero. We agree binding to multiple sites can be an issue. However, other techniques report on the ensemble of binding and, hence, no real useful information is obtained. Native MS enables one step in the right direction by dissecting the different lipid bound states. Future directions will need to further address this forefront question in the field, which we make point of now in discussion.

      Reviewer #1 (Recommendations For The Authors):

      Experiments/analysis: In short, there should be a proof of principle experiment that the thermodynamic constants determined by MS are accurate. Once that is done, the authors can add a more engaging structural interpretation of the results from the mutant cycles (which they seem to consciously avoid in the present manuscript?). How are cooperative residues coupled? Why?

      See our detailed response to reviewer 1 above.

      The manuscript is well-written, but Figures 3-5 are somewhat repetitive and require a lot of time to understand. Schematics of the main findings in each figure would help the uninitiated reader.

      We agree the illustrations are complex but there is rich data being shown.

      Figure 2 C contains an x-axis label error.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      1. Lines 128-129: "Like other mutant cycle studies, we assume the single- and double-mutations do not disrupt binding at specific sites on MsbA."

      This statement is obscure and needs to be clarified. Does this mean that the mutations still allow binding of KDL, or the mutations do not disrupt the conformational integrity of the binding sites?

      This statement has been removed.

      1. Lines 137-139: "More specifically, R78 coordinates one of the characteristic phosphoglucosamine (P-GlcN) substituents of KDL whereas K299 interacts with a carboxylic acid group in the headgroup of KDL."

      Two identical subunits form a dimer interface that forms an LPS binding site. Thus, a single mutation on the inner cavity will disrupt two binding sites on LPS. One R78 to P-ClcN and the other to a sugar backbone. Also, one K299 interacts with a carboxylic acid group in the headgroup and the other to an unknown (not clear in the figure).

      Also noted above, mutation of the outer site will also impact the two outer sites. We have made note of this caveat.

      1. Lines 171-172: "leading to an increase in ΔG by ~4 kJ/mol (Fig. 2d)"

      Relative to what?

      Corrected.

      1. Lines 172-173: "Mutant cycle analysis indicates a coupling energy (ΔΔGint) of 1.7 (plus minus) 0.4 kJ/mol that contributes to the stability of KDL-MsbA complex."

      The sign of DDG (DDH,DDS)_int is a bit confusing. I recommend that authors define the meaning of negative or positive sign of DDG_int (DDH,DDS) at this point. Here, a positive sign means favorable cooperation between the two mutated residues. Sometimes, researchers designate a positive cooperativity as a negative sign.

      The literature on mutant cycles does not appear to follow a consensus on the sign. Here, we have revised the manuscript to note positive sign means favorable cooperation and follow the formalism recently described by Horovitz, Sharon, and co-workers [5].

      1. Lines 182-185: "Enthalpy and entropy for KDL binding MsbA R188A was largely similar to the wild-type protein (Fig 3a). However, the R243A mutation resulted in an increase in entropy, compensated for by an increase in positive enthalpy (Fig 3a)."

      The thermodynamic parameters for R243A mutation change in a similar manner to WT and R188A. It is R238A, not R243A, whose DH-DS interplay shows a distinct pattern from WT. Please, reword this sentence.

      The sentence has been revised.

      1. Lines 252-253: Solvation of polar groups in aqueous solvent has been ascribed to positive heat capacities whereas negative for apolar solvation.

      This statement is not precise. More precisely, the collapse of apolar molecules from their solvated state leads to the negative "change" in heat capacity.

      The sentence has been corrected.

      1. Line 262-267: "These hydrophilic patches will be highly solvated, which will be desolvated upon binding lipids contributing favorably to entropy. In the case of MsbA, the selected lysine and arginine residues (based alpha carbon position) are separated by about 9 to 18 Å (PDB 8DMM). This distance could result in overlap of solvation shells that collectively contribute to the positive coupling enthalpy observed for MsbA-KDL interactions."

      This statement is too speculative without presenting the degree of solvation of the residues targeted for mutation. More quantitative arguments seem to be needed.

      We have removed the speculative statement.

      Reviewer #3 (Recommendations For The Authors):

      In this paper presented by Liu et al, native MS on the lipid A transporter MsbA was used to obtain thermodynamic insight into protein-lipid interactions. By performing the analyses at different lipid A concentrations and temperatures, dissociation constants for 2-3 lipid A binding sites were determined, as well as enthalpies were calculated using nonlinear van't Hoff fitting.

      Changes in free Gibb's energies were then calculated based on the determined dissociation constants, and together with the enthalpy values obtained via van' t Hoff analysis the entropic contribution to lipid binding (DeltaS*T) was indirectly determined.

      Correction – In the case on linear van’t Hoff plots, dH and dS were determined directly from the plot. For the nonlinear form of the van’t Hoff equation, which does not include an entropy fitting parameter, we back calculated dS using dH and dG at a given temperature.

      The authors then included single, double and triple mutants of residues known based on cryo-EM and X-ray structures to interact with Lipid A either in the large inward-facing cavity or at a secondary binding site accessible at the surface of outward-facing MsbA, and determined the thermodynamic parameters of these mutants alone and combined to gain access to coupling energies of pairwise interactions. This method has its roots in studying pair-wise interactions of protein-protein interfaces, generally known as thermodynamic mutant cycle analysis.

      Having the main expertise in ABC transporter structure-function, I will judge the paper mostly from the standpoint of what I can learn as a transporter expert from this study and whether the insights are of value for researchers with average biophysical knowledge.

      My overall impression of the manuscript is that, while it contains a wealth of experimental data using the innovative and unique method of native mass spectrometry, it is hard to understand what one can learn from this analysis beyond their interesting key finding that entropy plays an important role in lipid binding (but only at certain temperatures). In particular, the lessons learned from the coupling energy analysis of the introduced mutations is hard to grasp/digest for me with regards to what I can learn from these numbers (other than learning that there are such coupling effects).

      We agree the thermodynamic data is rich. Often a ddGint of zero is reported as having no coupling/significance but here the value is due to compensating ddH and d-dTS terms. In our view, this work forms the foundation of additional studies to better understand the coupling energetic terms, beyond ddGint.

      In some instances, the text/figure legends are a bit unclear or contain some typos; but this part can easily be handled in a revision. The discussion is well written and embeds the main findings in the (still rather limited) literature on thermodynamic analyses of lipid binding of membrane proteins.

      Major points

      1. The authors may have clarified the following point in a previous paper; but at least in this paper, it is unclear to me how they purified MsbA without lipid A. The reason I am asking is that in our experience, if one purifies MsbA expressed from E. coli with standard detergents (e.g. beta-DDM) one will find a perfect density for Lipid A when determining an inward-facing structure by cryo-EM. According to the Methods, MsbA is purified initially in DDM, and rebuffered to C10E5 during size exclusion chromatography. When looking at Fig. 2b, the authors state (or assume?) that if no lipid A is added, MsbA has 0 % lipid A bound.

      We have previously reported details of MsbA sample prep and optimization [6]. The revised manuscript makes note of this previous work and refers the reader to the publication. Yes, we see no appreciable signal for lipid A bound to MsbA (see Fig 2b).

      We also note that samples of MsbA prepared using DDM is highly heterogenous, contaminated by a battery of small molecules (that we suspect are co-purified lipids). These contaminants will inadvertently impact biochemical studies.

      1. A second topic where further clarification is in my view needed is the question of the conformations that were probed and the lipid binding sites. If I get the experimental rationale correctly, most of the data were determined in the absence of nucleotides, and only a small subset (Fig. 5) of data were determined in the presence of ATP-vanadate. However, structural evidence for the cytosolic lipid A binding site has been only determined for outward-facing MsbA (PDB: 8DMM), but has thus far not been seen in any of the inward-facing cryo-EM structures of MsbA, including recent well-resolved cryo-EM structures showing excellent density for the lipid A bound to the inward-facing cavity (PDB: 7PH2). Further, there is only one lipid A molecule that can be accommodated by the inward-facing cavity, whereas (owing to the symmetry of the homodimer) two lipid A can be bound sideways to outward-facing MsbA. Now, my understanding problem is why one does see up to three lipid A molecules bound to inward-facing apo MsbA, e.g. Fig. 2b and elsewhere. Where are they expected to bind? And what is the evidence supporting these additional binding sites?

      See our detailed response to reviewer 1. If we add more lipid, we see more lipid binding to MsbA, like every other membrane protein we have studied. This data clearly indicates that there are more KDL binding site(s) – deciphering the affinity of these site(s) represents a problem on the horizon.

      A further question is which lipid A binding sites are present in vanadate-trapped MsbA. Here, there are two identical binding sites (at the surface of each MsbA molecule), and it is therefore surprising to see that the affinities for the first and the second binding site are so different (see e.g. Supplementary Fig. 13).

      Great point. A logical explanation (described for other biochemical systems) is the two exterior LPS binding sites display negative cooperativity i.e., binding at one site weakens the affinity at the other site.

      Finally, what is the evidence that in vanadate-trapped MsbA, all molecules have closed NBDs and thus assume the outward-facing conformation? It is not uncommon that vanadate trapping leads to NBD closure only in a subfraction of all transporters (hence not in 100 % of them).

      Yes, the native mass spectrum shows no appreciable signal for MsbA not trapped with vanadate/ADP. In our previous cryoEM study [6], using the vanadate-trapped transporter, we did not observe particles with NDBs dissociated in space. Regarding samples from other labs, a native mass spectrum could shed light into the population of untrapped protein – however, most studies use SDS-PAGE for quality control of their purified samples. This technology is not sufficient to address underlying biochemical issues.

      We do have a new report in preparation describing a new discovery regarding trapping efficiency of MsbA.

      1. The key parameter that is underlying the entire thermodynamic analysis of wt and mutant MsbA is the dissociation/association constant, which are used to calculate free Gibb's energy and, via van't Hoff analysis, enthalpy. Entropy is not determined directly, but in fact indirectly from these two numbers both depending on the measurement quality of dissociation/association constant. Now, when looking at the fitted curves as shown in Figure 2b (and in the supplement), determination of the dissociation constant for KDL1 (blue curves) look reasonable and the determined KDs are within the range of measured points. However, for KDL2 (red) and even more so KDL3 (yellow), the determined KD values (Supplementary Table 5), the measured KD values are typically higher than highest KDL conc used in the assay (1.5 uM). For this reason, and despite the fact that error bars of the fits look reasonably small, I still have doubts about the reliability of these KD values for KDL2 and KDL3.

      Hence, the surprisingly strong changes of enthalpy/entropy values for different mutants/temperatures may have their origin in incorrectly determined KD values.

      The increase in binding affinity of subsequent lipid binding events is consistent with many reports from our group [1, 2, 4, 6-9] and that of Prof. Robinson [10, 11] on this topic. As noted above, we indeed observe linear van’t Hoff plots with positive and negative slopes as well as non-linear curves that are convex or concave. The MsbA protein (wt or mutant), trapped or not, all display unique temperature-dependent responses. If the reviewer suggestion that the Kd values are incorrectly or randomly determined, then none of the binding data should follow thermodynamic van’t Hoff equations. This is simply not the case - the error bars and fits are reasonable. Backing up even further, looking the raw native mass spectra (see supplemental figure 1-3 and 10-11) one can see different temperature-dependence of lipid binding.

      Minor points

      1. Lines 116-131: this section reads as an extended introduction/aims, and does not contain any results.

      This section has been moved to introduction.

      1. Lines 137-139: suggested to check whether these interactions are also present in recently determined cryo-EM structures determined at fairly high resolution (PDB: 7PH2)

      The interactions of MsbA and LPS (bound at the interior site) are comparable for PDB 7PH2 and 6BPL.

      1. Lines 144-146: suggested to elude in more detail on the fitting procedure here, as the KD values determined in this way are the foundation of all quantitative assessments.

      Details of data analysis and the fitting procedure are provided in methods.

      1. Figure legend, Fig. 2: Technically, MsbA was solubilized and purified in DDM and detergent exchange was done on SEC to C10E5.

      Corrected.

      1. Figure legend, Fig. 4: description in a) on deconvoluted mass spec data is incorrect. Letter below needs to be adjusted accordingly.

      Corrected.

      1. Figure legend, Fig. 5: suggested to mention in Figure legend title that here we look at ADP-vanadate trapped MsbA.

      Corrected.

      References 1. Cong, X., et al., Determining Membrane Protein–Lipid Binding Thermodynamics Using Native Mass Spectrometry. Journal of the American Chemical Society, 2016. 138(13): p. 4346-4349.

      1. Cong, X., et al., Allosteric modulation of protein-protein interactions by individual lipid binding events. Nat Commun, 2017. 8(1): p. 2203.

      2. Qiao, P., et al., Insight into the Selectivity of Kir3.2 toward Phosphatidylinositides. Biochemistry, 2020. 59(22): p. 2089-2099.

      3. Qiao, P., et al., Entropy in the Molecular Recognition of Membrane Protein-Lipid Interactions. J Phys Chem Lett, 2021. 12(51): p. 12218-12224.

      4. Sokolovski, M., et al., Measuring inter-protein pairwise interaction energies from a single native mass spectrum by double-mutant cycle analysis. Nat Commun, 2017. 8(1): p. 212.

      5. Lyu, J., et al., Structural basis for lipid and copper regulation of the ABC transporter MsbA. Nat Commun, 2022. 13(1): p. 7291.

      6. Patrick, J.W., et al., Allostery revealed within lipid binding events to membrane proteins. Proc Natl Acad Sci U S A, 2018. 115(12): p. 2976-2981.

      7. Schrecke, S., et al., Selective regulation of human TRAAK channels by biologically active phospholipids. Nature Chemical Biology, 2021. 17(1): p. 89-95.

      8. Zhu, Y., et al., Cupric Ions Selectively Modulate TRAAK-Phosphatidylserine Interactions. J Am Chem Soc, 2022. 144(16): p. 7048-7053.

      9. Tang, H., et al., The solute carrier SPNS2 recruits PI(4,5)P(2) to synergistically regulate transport of sphingosine1-phosphate. Mol Cell, 2023. 83(15): p. 2739-2752 e5.

      10. Yen, H.Y., et al., PtdIns(4,5)P(2) stabilizes active states of GPCRs and enhances selectivity of G-protein coupling. Nature, 2018. 559(7714): p. 423-427.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Butkovic et al. perform a genome-wide association (GWA) study on Arabidopsis thaliana inoculated with the natural pathogen turnip mosaic virus (TuMV) in laboratory conditions, with the aim to identify genetic associations with virus infection-related parameters. For this purpose, they use a large panel of A. thaliana inbred lines and two strains of TuMV, one naïve and one pre-adapted through experimental evolution. A strong association is found between a region in chromosome 2 (1.5 Mb) and the risk of systemic necrosis upon viral infection, although the causative gene remains to be pinpointed.

      This project is a remarkable tour de force, but the conclusions that can be reached from the results obtained are unfortunately underwhelming. Some aspects of the work could be clarified, and presentation modified, to help the reader.

      (Recommendations For The Authors):

      • It is important to note that viral accumulation and symptom development do not necessarily correlate, and that only the former is a proxy for "virus performance". These concepts need to be clear throughout the text, so as not to mislead the reader.

      This has been explained better in line 118-120, “Virus performance has been removed.

      • Sadly, only indirect measures of the viral infection (symptoms) are used, and not viral accumulation. It is important to note that viral accumulation and symptom development do not necessarily correlate and that only the former is a proxy for "virus performance". These concepts need to be clear throughout the text, so as not to mislead the reader. The mention of "virus performance" in line 143 is therefore not appropriate, nor is the reference to viral replication and movement in the Discussion section.

      "Virus performance" was removed. Also, the reference to viral replication and movement in the Discussion section has been removed.

      Now we mention: “We did not measure viral accumulation, but note this is significantly correlated with intensity of symptoms within the Col-0 line (Corrêa et al. 2020), although it is not clear if this correlation occurs in all lines.”

      • Since symptoms are at the center of the screen, images representing the different scores in the arbitrary scales should ideally be shown.

      Different Arabidopsis lines would look different and this could mislead a reader not familiar with the lines. In order to make a representation of our criteria to stablish the symptoms, we believe that a schematic representation is clearer to interpret. Here are some pictures of different lines showing variating symptoms:

      Author response image 1.

      • Statistical analyses could be added to the figures, to ease interpretation of the data presented.

      Statistical analysis can be found in methods. We prefer to keep the figure legend as short as possible.

      • The authors could include a table with the summary of the phenotypes measured in the panel of screened lines (mean values, range across the panel, heritability, etc.).

      These data are plotted in Fig. 1. We believe that repeating this information in tabular form would not contribute to the main message of the work. Phenotype data and the code to reproduce figure 1 are available at GitHub (as stated in Data Availability), anyone interested can freely explore the phenotypes of the screened lines.

      • The definition of the association peak found in chromosome 2 could be explained further: is the whole region (1.5 Mb) in linkage disequilibrium? How many genes are found within this interval, and how were the five strong candidates the authors mention in line 161 selected? It is also not clear which are these 5 candidates, apart from AT2G14080 and DRP3B - and among those in Table 1 (which, by the way, is cited only in the Discussion and not in the Results section)? Why were AT2G14080 and DRP3B in particular chosen?

      We have replaced Table 1 with an updated Table S1 listing all genes found within the range of significant SNPs for each peak. We now highlight a subset of these genes as candidate genes if they have functions related to disease resistance or defence, and mentioned them explicitly in the text (lines 173-179. We have explicitly described how this table was constructed in the methods (lines 525-538).

      • Concerning the validation of the association found in chromosome 2 (line 169 and onward): the two approaches followed cannot be considered independent validations; wouldn't using independent accessions, or an independent population (generated by the cross between two parental lines, showing contrasting phenotypes, for example) have been more convincing?

      We aim to compare the hypothesis that the association is due to a causal locus to the null hypothesis that the observed association is a fluke due to, for example, the small number of lines showing necrosis. If this null hypothesis is true then we would not expect to see the association if we run the experiment again using the same lines. An alternative hypothesis is that the genotype at the QTL and disease phenotypes are not directly causally linked, but are both correlated with some other factor, such as another QTL, or maternal effects. We agree that an independent sample would be required to exclude the latter hypothesis, but argue that the former is the more pertinent. We have edited the text to be explicit about the hypothesis we are testing, and altered the language to shift the focus from ‘validation’ to ‘confirming the robustness’ of the association (line 182).

      • Regarding the identification of the transposon element in the genomic region of AT2G14080: is the complementation of the knock-out mutant with the two alleles (presence/absence of the transposon) possible to confirm its potential role in the observed phenotype?

      This could be feasible but we cannot do it as none of the researchers can continue this project.

      • On the comparison between naïve and evolved viral strains: is the evolved TuMV more virulent in those accessions closer to Col-0?

      This is not something we have looked at but would certainly be an interesting follow-up investigation.

      • The Copia-element polymorphism is identified in an intron; the potential functional consequences of this insertion could be discussed. In the example the authors provide, the transposable element is inserted into the protein-coding sequence instead.

      We now state explicitly that such insertions are expected to influence expression; beyond that we can only speculate. We have removed the reference to the insertion in the coding sequence.

      • The authors state in line 398 that "susceptibility is unquestionably deleterious" - is this really the case? Are the authors considering susceptibility as the capacity to be infected, or to develop symptoms? Viral infections in nature are frequently asymptomatic, and plant viruses can confer tolerance to other stresses.

      We have tone down the expression and clarify our wording: “Given that potyvirus outbreaks are common in nature (Pagán et al., 2010) and susceptibility to symptomatic infection can be deleterious”

      Additional minor comments:

      • In Table 1, Wu et al., 2018 should refer to DRP2A and 2B, not 3B.

      We have removed Table 1 altogether.

      • Line 126: a 23% increase in symptom severity is mentioned, but how is this calculated, considering that severity is measured in four different categories?

      This is the change in mean severity of symptoms between the two categories.

      • Figure 1F: "...symptoms"

      Fixed.

      • Line 179: "...suggesting an antiviral role..."

      Changed.

      • Lines 288-300: This paragraph does not fit into the narrative and could be omitted.

      It has been removed and some of the info moved to the last paragraph of the Intro, when the two TuMV variants were presented.

      • Lines 335-337: The rationale here is unclear since DRP2B will also be in the background - wouldn't DRPB2B and 3B be functionally redundant in the viral infection?

      Our results suggest that DRPB3B is redundant with DRPB2B for the ancestral virus but not for the evolved viral strain. We speculate that the evolved viral isolate may have acquired the capacity to recruit DRPB3B for its replication and hence it produces less symptoms when the plant protein is missing.

      We have spotted a mistake that may have add to the confusion. Originally the text said “In contrast, loss of function of DRP3B decreased symptoms relative to those in Col-0 in response to the ancestral, but not the evolved virus”. The correct statement is “In contrast, loss of function of DRP3B decreased symptoms relative to those in Col-0 in response to the evolved, but not the ancestral virus.”  

      Reviewer #2 (Public Review):

      The manuscript presents a valuable investigation of genetic associations related to plant resistance against the turnip mosaic virus (TuMV) using Arabidopsis thaliana as a model. The study infects over 1,000 A. thaliana inbred lines with both ancestral and evolved TuMV and assesses four disease-related traits: infectivity, disease progress, symptom severity, and necrosis. The findings reveal that plants infected with the evolved TuMV strain generally exhibited more severe disease symptoms than those infected with the ancestral strain. However, there was considerable variation among plant lines, highlighting the complexity of plant-virus interactions.

      A major genetic locus on chromosome 2 was identified, strongly associated with symptom severity and necrosis. This region contained several candidate genes involved in plant defense against viruses. The study also identified additional genetic loci associated with necrosis, some common to both viral isolates and others specific to individual isolates. Structural variations, including transposable element insertions, were observed in the genomic region linked to disease traits.

      Surprisingly, the minor allele associated with increased disease symptoms was geographically widespread among the studied plant lines, contrary to typical expectations of natural selection limiting the spread of deleterious alleles. Overall, this research provides valuable insights into the genetic basis of plant responses to TuMV, highlighting the complexity of these interactions and suggesting potential avenues for improving crop resilience against viral infections.

      Overall, the manuscript is well-written, and the data are generally high-quality. The study is generally well-executed and contributes to our understanding of plant-virus interactions. I suggest that the authors consider the following points in future versions of this manuscript:

      1. Major allele and minor allele definition: When these two concepts are mentioned in the figure, there is no clear definition of the two words in the text. Especially for major alleles, there is no clear definition in the whole text. It is recommended that the author further elaborate on these two concepts so that readers can more easily understand the text and figures.

      We agree that the distinction between major/minor alleles and major/minor associations in our previous manuscript may have been confusing. In the current manuscript we now define the minor allele at a locus as the less-common allele in the population (line 167). We have removed references to major/minor associations, and instead refer to strong/weak associations.

      1. Possible confusion caused by three words (Major focus / Major association and major allele): Because there is no explanation of the major allele in the text, it may cause readers to be confused with these two places in the text when trying to interpret the meaning of major allele: major locus (line 149)/ the major association with disease phenotypes (line 183).

      See our response to the previous comment.

      1. Discussion: The authors could provide a more detailed discussion of how the research findings might inform crop protection strategies or breeding programs.

      We would prefer to restrain speculating about future applications in breeding programs.

      (Recommendations For The Authors):

      1. Stacked bar chart for the Fig 1F. It is recommended that the author use the form of a stacked bar chart to display the results of Fig 1F. On the one hand, it can fit in with the format of Fig 1D/E/G, on the other hand, it can also display the content more clearly.

      We think the results are easier to interpret without the stacked bar chart.

      1. Language Clarity: While there are no apparent spelling errors, some sentences could be rewritten for greater clarity, especially when explaining the results in Figure 1 and Figure 2.

      We have reviewed these sections and attempted to improve clarity where that seemed appropriate.

      There are some possibilities to explore in the future. For example: clarity of mechanisms for the future. While the study identifies genetic associations, it lacks an in-depth exploration of the underlying molecular mechanisms. Elaborating on the mechanistic aspects would enhance the scientific rigor and practical applicability of the findings.

      Yes, digging into the molecular mechanisms is an ongoing task and will be published elsewhere. It was out of the scope of this already dense manuscript.  

      Reviewer #3 (Public Review):

      Summary of Work

      This paper conducts the largest GWAS study of A. thaliana in response to a viral infection. The paper identifies a 1.5 MB region in the chromosome associated with disease, including SNPs, structural variation, and transposon insertions. Studies further validate the association experimentally with a separate experimental infection procedure with several lines and specific T-DNA mutants. Finally, the paper presents a geographic analysis of the minor disease allele and the major association. The major take-home message of the paper is that structural variants and not only SNPs are important changes associated with disease susceptibility. The manuscript also makes a strong case for negative frequency-dependent selection maintaining a disease susceptibility locus at low frequency.

      Strengths and Weaknesses

      A major strength of this manuscript is the large sample sizes, careful experimental design, and rigor in the follow-up experiments. For instance, mentioning non-infected controls and using methods to determine if geographic locus associations were due to chance. The strong result of a GWAS-detected locus is impressive given the complex interaction between plant genotypes and strains noted in the results. In addition to the follow-up experiments, the geographic analysis added important context and broadened the scope of the study beyond typical lab-based GWAS studies. I find very few weaknesses in this manuscript.

      Support of Conclusions

      The support for the conclusions is exceptional. This is due to the massive amount of evidence for each statement and also due to the careful consideration of alternative explanations for the data.

      Significance of Work

      This manuscript will be of great significance in plant disease research, both for its findings and its experimental approach. The study has very important implications for genetic associations with disease beyond plants.

      (Recommendations For The Authors):

      Line 41 - Rephrase, not clear "being the magnitude and sign of the difference dependent on the degree of adaptation of the viral isolate to A. thaliana."

      Now it reads: “When inoculated with TuMV, loss-of-function mutant plants of this gene exhibited different symptoms than wild-type plants, where the scale of the difference and the direction of change between the symptomatology of mutant and wild-type plants depends on the degree of adaptation of the viral isolate to A. thaliana.”

      Line 236 - typo should read: "and 21-fold"

      Changed.

    1. Author Response

      The following is the authors’ response to the original reviews.

      In this manuscript, Xie et al report the development of SCA-seq, a multiOME mapping method that can obtain chromatin accessibility, methylation, and 3D genome information at the same time. This method is highly relevant to a few previously reported long read sequencing technologies. Specifically, NanoNome, SMAC-seq, and Fiber-seq have been reported to use m6A or GpC methyltransferase accessibility to map open chromatin, or open chromatin together with CpG methylation; Pore-C and MC-3C have been reported to use long read sequencing to map multiplex chromatin interactions, or together with CpG methylation. Therefore, as a combination of NanoNome/SMAC-seq/Fiber-seq and Pore-C/MC-3C, SCA-seq is one step forward. The authors tested SCA-seq in 293T cells and performed benchmark analyses testing the performance of SCA-seq in generating each data module (open chromatin and 3D genome). The QC metrics appear to be good and the methods, data and analyses broadly support the claims. However, there are some concerns regarding data analysis and conclusions, and some important information seems to be missing.

      1. The chromatin accessibility tracks from SCA-seq seem to be noisy, with higher background than DNase-seq and ATAC-seq (Fig. 2f, Fig. 4a and Fig. S5). Also, SCA-seq is much less sensitive than both DNase-seq and ATAC-seq (Figs. 2a and 2b). This and other limitations of SCA-seq (high background, high sequencing cost, requirement of specific equipment, etc) need to be carefully discussed.

      We thank the reviewer for the important comment about noisy GpC methylation signal in SCA-seq. We acknowledge that the SCA-seq signal presented in Fig. 2f, Fig. 4a, and Fig. S5 in our first draft was indeed noisy, as we present the raw 1D genomic signal. In this revision, we have taken steps to reduce the noise in GpC methylation signal by identifying the accessible regions on each segment of every single molecule. For each segment, we performed the sliding window analysis (50bp window sliding by a 10 bp step) with binomial test to identify accessible windows that significantly deviate from background GpC methylation ratio. The overlapping accessible windows (p < 0.05 for binomial test and contain at least two GpC sites) on the single fragments are merged as accessible region. Then we retain the GpC methylation signal inside the accessible region to reduce the background noise (Sfig 5ab). The details of the noise filtering steps are described in the Methods section (page 22 lines 13-23).

      Visually, we can observe from the updated exemplary view of 1D signal track that the noise is dramatically reduced in filtered SCA-seq GpC methylation signal compared to the raw signal (Sfig5c). The clean SCA-seq GpC methylation 1D signals were also updated (Fig2f and Fig4a). We have observed an increase in the TSS enrichment score, which is a commonly used metric for assessing the signal-to-noise ratios in ATAC-seq data quality control. Specifically, the TSS enrichment score increased to 2.74 when using the filtered signal, compared to 1.93 when using the raw signal (Sfig5d). After noise filtering, 80% of SCA-seq 1D peaks overlaps with peaks called by ATAC-seq and/or DNase-seq (Fig2ab), compared to 74% from the raw signal in the first draft.

      We thank the reviewer for raising up the concern about the sequencing cost and requirement of specific equipment. The sequencing cost is approximately 1300 USD per sample to sequence 30X depth human sample and obtain saturated GpC methylation signal (Sfig4d) as well as loop signal similar to the NGS-based Hi-C (Fig3gh). Considering that SCA-seq simultaneously provides higher-order chromatin structure and chromatin accessibility at single molecule resolution, we believe the cost is acceptable. However, it is worth noting that SCA-seq requires a regular Oxford nanopore sequencer with R9.4.1 chip, which is currently available but might be discontinued by Oxford Nanopore in the future. We have addressed all these concerns in the discussion section.

      1. In Fig. 2f, many smaller peaks are present besides the major peaks. Are they caused by baseline DNA methylation? How many of the small methylation signals are called peaks? In Fig. 4a, it seems that the authors define many more enhancers from SCA-seq data than what will be defined from ATAC-seq or DHS. Are those additional enhancers false positives? Also, it is difficult to distinguish the gray "inaccessible segments" from the light purple "accessible segments.

      We thank the reviewer for bringing up these concerns.

      Regarding the smaller peaks in the 1D genomic GpC methylation signal, we have addressed this issue by implementing the noise filtering in this revision, the small peaks on 1D tracks are greatly reduced (Fig2f, Sfig5c). It is important to note that SCA-seq generates accessibility signals specifically on ligation junctions, which differs from the one-dimensional (1D) signals obtained through ATAC-seq or DNase-seq. The presence of remaining small peaks in the SCA-seq data can be attributed to the varied sequencing depth, which is influenced by the enriched spatial interactions occurring in regions of the genome that are enriched with ligation junctions. In general, the SCA-seq 1D peaks are well correlated with the high confidence peaks from 1D track of ATAC-seq and DNase-seq (Fig2b).

      We apologize for the lack of clarity in our enhancer annotation. The enhancer regions were obtained from The Ensembl Regulatory Build (PMID: 25887522). We have now included this information in the method section (page 24 line 16).

      We thank the reviewer for pointing out this visualization problem. The color scheme has been revised, with purple now representing the inaccessible segments and yellow representing the accessible segments.

      1. For 3D genome analysis, it is important to provide information about data yield from SCA-seq. With 30X sequencing depth, how many contacts are obtained (with long-read sequencing, this should be the number of ligation junctions)? How is the number compared to Hi-C.

      We thank the reviewer for raising up this crucial point about the sequencing yield that we missed. We have now included this information in the revised result section (page 11, lines 11-14).

      We have checked the public data of a successful HEK293T Hi-C run (PMID: 34400762). The Hi-C experiment produced 699,464,541 reads (105G base), and we obtained 388,031,859 contacts.

      From 100G bases of HEK293T SCA-seq data, we obtained 81,229,369 ligation junctions and 378,848,187 virtual pairwise contacts (3.8M pairwise contacts per Gb). The SCA-seq performance of virtual pairwise contact number per Gb is similar to that of PORE-C (PMID: 35637420).

      1. Fig 3j. Because SCA-seq only do GpC methylation, the capability to detect the footprint at individual CTCF peaks depends on the density of GpC nearby. Have the authors taken GpC density into account when defining CTCF sites with or without footprint?

      We appreciate the reviewer for bringing up the concern about the GpC site density at CTCF site. We would like to highlight that Battaglia et al. have demonstrated the feasibility of identifying transcription factor binding events using GpC labeling (PMID: 36195755). In our study, we have implemented a high-resolution sliding window approach to enhance the sensitivity of CTCF binding detection. We have taken GpC density into account by performing a sliding window (50 bp window, 10 bp step) binomial test on every single molecule overlapping with CTCF site to call accessible region. The detailed steps to call accessible region has been described in the answer of the first question. Based on the pattern in Fig3j, we identify CTCF footprints if the accessible regions are called nearby the CTCF sites (at least 20 bp away from the center of CTCF sites) but not on the CTCF sites.

      To ensure that the GpC site density is sufficient for binomial test of each sliding window of the regions around CTCF site genome-wide, we examined the number of GpC sites in each window. Our analysis revealed that GpC sites are evenly distributed, and over 87% of the windows contain at least 2 GpC sites, which qualifies them for a binomial test (Author response image 1). This indicates that we are able to detect the CTCF footprint at most of the CTCF sites, taking into consideration the GpC density.

      Author response image 1.

      Genome wide GpC site density at CTCF site centered region. Distribution of the number of GpC sites (y-axis) at each 50 bp sliding window region (x-axis) was presented in violin plots.

      1. This study only performs higher resolution chromatin interaction analysis based on individual read concatenates. It is unclear to me if the data have enough depth to perform loop analysis with Hi-C pipelines.

      We thank the reviewer for highlighting this important concern about the depth of data for performing loop analysis. We have performed Aggregate peak analysis for SCA-seq and Hi-C side-by-side using hiccups function in Juicer (v1.9.9) (PMID: 27467249). We acknowledge that the level of loop signal enrichment is relatively weaker (one-fold less) in SCA-seq compared to Hi-C (Fig3h). This difference can be attributed to the lower sequencing yield per Gb in SCA-seq, which resulted in 4.93M pairwise contacts per Gb, compared to the 7M contacts per Gb in Hi-C. Despite this discrepancy, we were still able to observe the clear genome-wide loop enrichment pattern in SCA-seq (Fig3gh).

      1. It appears that SCA-seq is of low efficiency in detecting chromatin interactions. As shown in Fig. S7a, 65.4% of sequenced reads contained only one restriction enzyme (RE) fragment/segment (with no genomic contact), which is much higher than that reported in published PORE-C methods. In addition, Fig. S7g is very confusing and in conflict with Fig. S7a. For example, in Fig. S7g, 21.4% and 22.2% of CSA-seq concatemers contain one and two segments, whereas the numbers are 65.4% and 14.7% in Fig. S7a, respectively. Please explain.

      We apologize for the confusion in sfig7a and sfig7g.

      Sfig7a was intended to illustrate the cardinality count of concatemers with only chr7 segments included, representing the intra-chromosome cardinality instead of the genome-wide cardinality. We have revised sfig7a and its corresponding figure legend to clarify that the figure describes segments of intra-chromosome interactions.

      On the other hand, sfig7g shows the concatemers including both intra-chromosome and inter-chromosome segments, which explains the differences in the percentages of different cardinality ranges compared to Figure S7a. Moreover, the percentages reported in Figure S7g are similar to what is typically reported in PORE-C methods when considering both intra- and inter-chromosome interactions.

      To provide a comprehensive view of the genome-wide concatemer cardinality distribution, we have also included a histogram in Fig3k, which demonstrates the detailed distribution of cardinality for genome-wide concatemers.

      1. I disagree with the rationale of the entire Fig. S9. Biologically there is no evidence that chromatin accessibility will change due to genome interactions (the opposite is more likely), therefore the definition of "expected chromatin accessibility" is hard to believe. If the authors truly believe this is possible, they will need to test their hypothesis by deleting cohesin and check if the chromatin accessibility driven by "power center" are truly abolished. The math in Fig. S9 is also confusing. Firstly, the dimension of the contact matrix in Fig. S9 appears to be wrong, it should have 8 rows. Secondly, I don't understand why the interaction matrix is not symmetric. Third, if I understand correctly the diagonal of the matrix should be all 1, it is also hard to understand why the matrix only has 1, 0 or -1. It appears that the authors assume that the observed accessibility is a simple sum of the expected accessibility of all its interacting regions; this is wrong. In my opinion, the whole Fig. S9 should be deleted unless the authors can make sense of it and ideally also provide more evidence.

      I apologize for any confusion caused by the rationale and figures in Fig. S9. The purpose of the hypothesis presented in the figure is to explore the potential relationship between chromatin accessibility and genome interactions. While there is currently no direct biological evidence supporting this hypothesis, it is a possibility that warrants further investigation.

      Regarding the suggestion to delete Fig. S9 unless more evidence is provided, it is important to note that this paper primarily focuses on the methodology and theoretical framework. Experimental validation of the hypothesis falls outside the scope of this particular study.

      We have made corrections to the schematic matrix in Fig. S9 to accurately represent the dimensions and symmetry. The numbers in the matrix represent mean accessible values of the contacts. Specifically, accessible-accessible contacts are represented by 2, accessible-inaccessible contacts are represented by 0, and inaccessible-inaccessible contacts are represented by -2.

      Minor concerns:

      1. The authors may want to clearly demonstrate the specificity and sensitivity of the ATAC part and the efficiency of the Hi-C part of SCA-seq.

      We appreciate the reviewer’s suggestion to demonstrate the specificity and sensitivity of the ATAC-seq part and the efficiency of the Hi-C part in SCA-seq.

      We considered the non-peak region genomic bins shared by ATAC-seq and DNase-seq as true negatives and the overlapping peaks of ATAC-seq and DNase-seq as true positives. Based on these criteria, the specificity of SCA-seq 1D peaks is calculated as TN / N, where TN represents the number of true negatives (89107) and N represents the sum of true negatives and false positives (89107 + 9345). The resulting specificity is 0.91. The sensitivity of SCA-seq 1D peaks is calculated as TP / P, where TP represents the number of true positives (33190) and P represents the sum of true positives and false negatives (33190 + 11758). The resulting sensitivity is 0.73.

      We evaluate the efficiency of spatial interaction by the restriction enzyme digested fragments recovered in the pairwise contacts that contain ligation junctions. In SCA-seq, the efficiency is calculated as the number of dpnII digested fragments recovered by pairwise contacts (5625908) divided by the total number of in silico dpnII digested fragments (7127633). The resulting efficiency is 0.79.

      We have now included this information in the revised result section (page 8 lines 15-18)

      1. Fig 4g, colors with apparent differences might be used to clearly discriminate the three types of interactions (I-I, I-A and A-A).

      We appreciate the reviewer for bringing up the issue regarding the visualization in Fig 4g. The color scheme has been revised, with purple now representing I-I interactions, orange representing I-A interactions, and red representing A-A interactions. We believe that these modifications have significantly improved the clarity.

      1. Fig. 4c, when fitting an unknown curve, R-square becomes meaningless.

      We appreciate the reviewer for pointing out the issue regarding the interpretation of R-square. We have removed the R-square value from Fig. 4c.

      1. Fig 5a, "oCGIs comprised 65% CGIs that did not directly contact enhancers or promoters". Should it be "oCGIs comprised 65% of all CGIs"?

      We appreciate the reviewer for pointing out the clarification needed in Fig 5a. We have revised the phrase in the figure legend to accurately state that “oCGIs comprised 65% of all CGIs”. Thank you for bringing this to our attention.

      1. Page 15 lines 5-8, "By examining the methylation status on reads, as expected, these read segments demonstrated lower CpG methylation and higher chromatin accessibility (GpC methylation), which further supports their roles in gene activation (Fig 5b)". This statement seems to be inconsistent with the figure legend.

      We appreciate the reviewer for pointing out the inconsistency in the legend of Fig 5b. We have revised the legend of Fig 5b to accurately highlight the low CpG methylation on oCGI regions. Thank you for bringing this to our attention.

      1. Language editing and proof reading are needed.

      I apologize for any errors or mistakes in the language. We have carefully reviewed the manuscript and made the necessary language editing and proofreading revisions to ensure its quality for publication.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for collectively highlighting our study as “interesting and timely” and as making significant advances regarding the functional role of Orai in the activity of central dopaminergic neurons underlying the development of Drosophila flight behaviour. We hope that based on the revisions detailed below the data supporting our findings will be considered complete.

      Reviewer 1:

      • In this revision, the authors have addressed most points using text changes but there is still one important issue that continues to be inadequately addressed. This relates to point 1.

      If Set2 is acting downstream of SOCE, it is not clear to me how STIM1 over expression rescues Set2-dependent downstream responses in flies that do not have Set2. It seems that if STIM1 over-expression, which would presumably enhance SOCE, largely rescues Set2-dependent effector responses in the Set2RNAi flies, then the proposed pathway cannot be true (because if Set2 is downstream of SOCE, it shouldn't matter whether SOCE is boosted in flies that lack Set2). This discrepancy is not explained. Does STIM1 over-expression somehow restore Set2 expression in the Set2RNAi flies?

      Ans: Based on the requirement of Orai-mediated Ca2+ entry for Set2 expression (THD’>OraiE180A neurons, Figure 2C) we had indeed proposed that rescue of flight in Set2RNAi flies by STIMOE is because Set2 expression in Set2RNAi flies is restored by STIMOE. However, we agree that this has not been tested experimentally. Since these data are supportive but not essential to our findings here, we have removed data demonstrating flight rescue of Set2RNAi by STIMOE from Figure 2 – supplement 5 and associated text from the revised manuscript. We plan to investigate the effect of STIMOE on Set2 in the context of Drosophila dopaminergic neurons in the future.

      Reviewer 2:

      The manuscript analyses the functional role of Orai in the excitability of central dopaminergic neurons in Drosophila. The authors answer the previous concerns, but several important issues have not been experimentally tested. Especially, the lack of characterization of SOCE or calcium release from the intracellular calcium stores limits considerably the impact of the study. They comment on a number of technical problems but, taking into account the nature of the study, based on Orai and SOCE, the lack of these experimental data reduces the relevance of the study. Below are some specific comments:

      1. The response to question 1 is unconvincing. The authors do not demonstrate experimentally that STIM over-expression enhances SOCE or how excess SOCE might overcome the loss of SET2.

      Ans: The reason we have not performed experiments in this manuscript to investigate SOCE in STIM overexpression condition is two-fold. Firstly, extensive characterisation of SOCE by STIM overexpression in Drosophila pupal neurons forms part of an earlier publication (Chakraborty and Hasan, Front. Mol. Neurosci, 2017). A graph from Chakraborty and Hasan, 2017 where SOCE was measured in primary cultures of pupal neurons from an IP3R mutant (S224F/G1891S) of Drosophila. Reduced SOCE in IP3R mutant neurons (red trace) was restored by overexpression of STIM (black trace). The green trace is of wild-type neurons with STIM overexpression and the grey trace with STIMRNAi. Similar experiments were performed with Orai+STIM overexpression and the rescue in SOCE was compared with STIM overexpression in pupal neurons of wild type and IP3R mutant S224F/G1891S. See Chakraborty and Hasan, 2017 (Front. Mol. Neurosci. 10:111. doi: 10.3389/fnmol.2017.00111)

      2) Secondly, rescue by STIMOE is supportive but not essential to the findings of this manuscript which relate primarily to the analysis of an Orai-dependent transcriptional feed-back mechanism acting via Trl and Set2 in flight promoting dopaminergic neurons (See Fig 2C where we demonstrate that OraiE180A expression in THD’ neurons brings down Set2 expression).

      We agree that we have not demonstrated how loss of Set2 can be compensated by STIM overexpression. Therefore, we have now removed the supplementary data relating to STIM rescue of Set2RNAi (THD’>Set2RNAi; STIMOE) flight phenotypes since as mentioned above it was supportive but not essential to the main theme of the manuscript. Consistent with this, we have also removed rescue of flight in TrlRNAi by STIMOE (Figure 4C).

      1. The authors do not present a characterization of SOCE in the cells investigated expressing native Orai or the dominant negative OraiE180A mutant yet. They comment on some technical problems for in situ determination or using culture cells but, apparently, in previous studies they have reported some results.

      Ans: We respectfully submit that characterisation of SOCE in cells expressing native Orai and OraiE180A from primary cultures of Drosophila pupal dopaminergic neurons, form part of an earlier publication (Pathak, T., et al., (2015). The Journal of Neuroscience, 35, 13784–13799. https://doi.org/10.1523/jneurosci.1680-15.2015). As mentioned in lines 80-84 the dopaminergic neurons studied here (THD’) are a subset of the dopaminergic neurons studied in the Pathak et al., 2015 publication (TH). As evident in Figure 2 panels B-D expression of OraiE180A in dopaminergic neurons abrogates SOCE.

      In this study we have focused on identifying the molecular mechanism by which OraiE180A expression and concomitant loss of cellular Ca2+ signals (Figure 3B, 3C) affects dopaminergic neuron function. In lines 270-274 (page 10) we have stated the technical reason why Ca2+ measurements made in this study from ex-vivo brain preps measure a composite of ER-Ca2+ release and SOCE. Our observation that the measured Ca2+ response is significantly attenuated in cells expressing OraiE180A leads us to the conclusion that we are indeed measuring an SOCE component in the ex-vivo brain preps. This is also explained in ‘Limitations of the study’.

      1. Concerning the question about the STIM:Orai stoichiometry the authors answer that "We agree that STIM-Orai stoichiometry is essential for SOCE, and propose that the rescue backgrounds possess sufficient WT Orai, which is recruited by the excess STIM to mediate the rescue"; however, again, this is not experimentally tested.

      Ans: To address this point we have now measured relative stoichiometries of STIM and Orai mRNA by qPCR under WT conditions in Drosophila THD’ neurons at 72 hr APF. The observed stoichiometry as per these measurements is STIM:Orai =1.6:1 (~8:5). These data are in relative agreement with the normalised read counts of STIM and Orai in THD’ neurons in the RNAseq performed and described in Fig 1F. The qPCR (A) and RNAseq (B) measures of STIM and Orai are appended below.

      Author response image 1.

      In comparison to the numerous studies investigating structural, biophysical and cellular characterisation of Orai channels in heterologous systems, there are fewer studies which have traced systemic implications of Orai function through multiple tiers of investigation including organismal behaviour. Leveraging the wealth of genetic resources available in Drosophila, we have attempted this here. While we respectfully agree that questions pertaining to the stoichiometries of STIM/Orai proteins are indeed relevant to cellular regulation of SOCE, we submit they may be better suited for investigation in heterologous systems involving cell culture, or with in-vitro systems with purified recombinant proteins, or indeed using computational and modelling approaches. None of these methods fall within the scope of our current investigation which is to understand how by Orai mediated Ca2+ entry regulates developmental maturation of Drosophila flight promoting dopaminergic neurons.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the Editor and the referees for their questions and remarks. In this document we provide a point-by-point response to revisions requested by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Jafarinia et al. have made an interesting contribution to unravelling the molecular mechanisms underlying pathological phenotypes of repeat expansion of the C9orf72 gene. The repeat expression leads to the expression of polyPR proteins. Using coarse-grained molecular dynamics simulations, the authors identify putative binding partners involved in nucleocytoplasmic transport (NCT), and that conjecture that polyPR affects essential processes by binding to NCT-related proteins. The results are well-reported, but only putative, and need experimental support to be more conclusive. Also, a comparison with results from all-atom MD simulations in explicit water could help verify the results. But even without these, the work is very useful as a first step to unravel the role of polyPR and related peptides.

      We greatly appreciate the reviewer's positive assessment of our work and the suggestions. We acknowledge the need for more experimental validation of the binding behavior of some of the transport components. Our results coincide with the experimental findings of Hutten et al. [1] ([16] in our paper) for example regarding the binding of polyPR to Kapβs and Impαs, but experimental validation of additional transport components, especially for RanGAP, would be valuable. We hope that our work will inspire colleagues from the field to actually perform such experiments.

      We also agree with the reviewer's suggestion that all-atom simulations can provide further details on the molecular conformations at the local NTR-PR binding regions. Nonetheless, such simulations for all transport components, particularly for interactions involving large conformational flexibility of longer polyPR chains such as PR50, would require significant computational expenses. In a recent publication (Jafarinia et al. [2]) we reported on the close resemblance in binding behavior between our coarse-grained MD data and the all-atom MD simulations of (Nanaura et al. [3]), both showing polyPR binding to a negatively-charged cavity of Kapβ2. We expect future MD simulations to elucidate more atomistic detail with the continuously increasing power of high-performance computing clusters.

      Reviewer #2 (Public Review):

      This study used coarse-grained molecular dynamics simulation to explain how the binding of polyPR might interfere with distinct stages of the transport cycle. This finding shows that the interaction between polyPR and transport components is driven by electrostatic interactions and is correlated with the salt concentration and the length of polyPR, providing an important basis for subsequent exploration of the impact of C9orf72 R-DPRs on NCT disruption.

      We appreciate the reviewer's positive feedback and the recognition of the significance of our work.

      Reviewer #3 (Public Review):

      Onck and co-workers present in this work the identification of binding partners and sites of polyPR on various nuclear transport components and elucidate how polyPR might potentially influence the transport process. It's interesting to note that some interaction sites on transport components also serve as their inherent/functional binding sites. The difference in the effects between short polyPR (PR7) and long polyPR (PR50) is also evident, although the authors might need to clarify the mechanisms better. Overall, the manuscript is well organized and concisely written, and it would greatly enhance our understanding of the toxicity induced by polyPR. In general, the 1-bead per atom force field model used in the study is well-tuned for studying the interactions between polyPR and proteins, as the essential cation-pi interactions (between Arg and Phe/Tyr/Trp) were included using an 8-6 LJ model.

      We thank the reviewer for recognizing the suitability of our 1-bead-per-amino-acid force field for studying R-DPRs' interactions with transport components and for acknowledging our work's contribution to understanding polyPR toxicity mechanisms. Below we comment on the mechanisms describing the difference between short and long polyPR molecules.

      Recommendations for the authors:

      1) Regarding Figure 2 (also see below for more specific comments), there is a major concern that the dipole moment is not included in Fig 2b (as the correlation is better with f=0), but the authors still conclude that this is generally important (lines 258-261). As a minimum, this needs to be discussed more carefully. Is f (i..e. the importance of dipole moment for binding) dependent on the specific binding partner, or what is going on? Maybe, there is a good explanation?

      Indeed, the significance of the dipole moment depends on the specific type of transport component involved. Our analysis reveals that for Kapβs, see figure 2b, the best-fit is obtained with f=0, indicating that the separation of charge within Kapβs has a relatively minor effect on their interaction with polyPR. Instead, the primary determinant for polyPR-Kapβ interaction appears to be the net charge per residue (NCPR), with a more negative NCPR leading to stronger interactions.

      We attribute this behavior to the structural characteristics of Kapβs, particularly the superhelical structure which features inner and outer surfaces with differing charge distributions. Importantly, this structural arrangement creates an inner surface characterized by a negative electrostatic potential. As demonstrated in our previous work, polyPR predominantly binds to this negatively charged cavity within Kapβs. Consequently, the separation of charges on the Kapβ surface becomes less influential compared to the overall charge. Other transport components, however, depicted in figure 2a, do not share this feature and the distribution of charges over the surface becomes a more critical factor in polyPR interactions. We have now added this explanation to page 6, and emphasized in the conclusion section that the effect of dipole moment is only observed for the transport components in figure 2a.

      2) Write out nucleoporin, Nup, at first appearance (line 51).

      We have changed it in line 51.

      3) Fig 1: a (representative) CG structure of polyPR (PR7,PR20 and PR70) would be very useful.

      We have added a CG representation of PR7 and PR20 to figure 1.

      4) Please use chi-square, not R-square, to evaluate the fit, as chi-square takes experimental errors into account.

      We use R-square as a standard measure to assess the quality of the fit in the simulations, as it considers the summation of residuals. This choice aligns with the methodology we have used in our previous publications and therefore prefer to use this measure here as well.

      5) Please use a dot (not a full stop) for multiplication in line 151 and Figure 2 legend.

      We made the adjustment in line 151, the caption of figure 2, and the y-axis label of figure S2.

      6) 330: it is very unconventional to plot half the std dev as an error bar. Please plot the std dev (standard error) of the mean.∙

      We made the suggested change and now the error bars in figure 2 are standard errors of the mean (SEM) calculated from block averaging with three blocks at equilibrium. We also amended the caption of figure 2 and the Methods section.

      7) Please write an explicit equation for the linear relation that is plotted in Figure 2. Something like: C_t = a(NCPR - fM/Rg)+b ? That would make it easier to read.

      We have now added the linear equation of the fit to a new table S4, and included a reference to it in the caption of figure 2.

      8) Fig 2: why is the fit to PR7 not reported/shown?

      The fits for PR7 resulted in R2 values of 0.89 (a) and 0.83 (b) for 200M and of 0.7 (a) and 0.59 (b) for 100 mM. Because of the low R2 values for 100 mM, the fits for PR7 are not shown. We have added this explanation to the caption of figure 2.

      9) Fig 4: isn't the blue shape KapB (and not importin)?

      We changed "importin" to "Kapβ Imp" for consistency.

      10) In the interest of reproducibility, a recommendation is to make the scripts for setting up, running, and analyzing the simulations freely available, e.g. at GitHub. This will increase reproducibility and transparency.

      At the moment we do not have the scripts available on GitHub. However, codes can be provided by the authors upon reasonable request, as also mentioned in the data availability statement in the paper.

      11) Can the authors explain the salient advances in this article versus the one published last year?

      In our previous work, we showed that polyPR binds to the Kapβ family of nuclear transport receptors (NTRs), consistent with experimental findings. While this provided valuable insights, it was essential to broaden our investigation as C9orf72 toxicity not only affects the Kapβ family of NTRs but also disrupts other key regulators of NCT. For instance, recent literature (see lines 87-91 in our paper) showed that Ran and its regulators RanGAP and RanGEF are mislocalized in cells expressing R-DPRs, and genetic screening studies have identified several nucleocytoplasmic transport genes as modifiers of R-DPR-mediated toxicity.

      In the present study, we therefore delved deeper into the underlying mechanisms of polyPR-modification of NCT. We focused on exploring whether polyPR directly interacts with Impα isomers, CAS/Cse1, RanGEF, RanGAP, Ran, and NTF2. By doing so, we unveiled a network of direct interactions between polyPR and a remarkably wide range of NCT components. This newfound insight is valuable for interpreting existing experimental findings, such as the mislocalization of RanGAP. We also demonstrate that polyPR binding is influenced not only by factors such as the net charge per residue and the polyPR chain length, as previously observed for Kapβs, but also by the spatial separation of charges, incorporated by an additional dependence on dipole moments in influencing the total number of contacts with polyPR. This sheds new light on how polyPR interacts with numerous targets within the cellular environment, providing a valuable reference for future (experimental) investigations of R-DPR-compromised nuclear transport. These points are explained in the last paragraph of the introduction and paragraphs 2,3 of the conclusion section. Paragraph 2 of the conclusion is also modified for clarification.

      12) In Figure 2(a), the vertical coordinates of the first graph do not match the others.

      We have now modified figure 2a left panel to match the others.

      13) When the polyPR length is large enough, it seems that the binding of polyPR to RanGEF and NTF2 is not significantly improved.

      The binding behavior depends on polyPR length, as well as on the net charge per residue and the dipole moment (expressed as NCPR-fM/R_g). We note that the number of contacts in figure 2 is normalized by the polyPR length so that for both NTF2 and RanGEF the total number of contacts increase with length (PR7 to PR20) when binding occurs. Specifically, for RanGEF, especially at lower ion concentrations (100 mM), PR7 and PR20 exhibit a similar number of contacts per unit length of polyPR. This implies that the absolute number of contacts between PR20 and RanGEF is higher than that of PR7. However, as we extend the polyPR length to PR50, there is a reduction in the number of contacts per unit length of polyPR. This phenomenon indicates that the more extended PR50 has regions that make little to no contact with RanGEF, resulting in a smaller number of contacts per unit length for PR50. Lines 188-195 are now modified to put more emphasis on the difference between number of contacts and number of contacts normalized by polyPR length.

      14) The representation of the mechanism in Figure 4 is not intuitive enough and the color scheme still needs to be improved.

      We have tried to improve clarity by including the names of each transport component next to their schematic representations.

      15) Figure 3 shows that the longer polyPR exhibits a higher contact probability with individual residues compared to a shorter polyPR, is this result in conflict with Figure 2?

      We re-iterate here that the number of contacts in figure 2 is normalized by the polyPR length, while the results in Fig. 3 are not.

      Figure 3 and figure S4 demonstrate that as the length of polyPR increases, the contact probability of individual residues of transport components for interaction with polyPR also increases.

      In figure 2, we have normalized the time-averaged number of contacts by the length of polyPR. For example, in the top-right panel of figure 2a, when comparing results for PR7 with PR50 interaction with RanGAP, a higher value for PR7 indicates that PR7 makes more contacts per unit of its length with RanGAP. In terms of absolute number of contacts, however, the PR50 chain makes more contacts with RanGAP, resulting in a higher contact probability. We now added a sentence (see lines 188-189) for clarification.

      In summary, when a short polyPR strongly binds to a transport component (evidenced by a relatively large number of contacts), it makes more contacts per unit length than a large poyPR. This occurs because for shorter polyPRs most of the residues come into contact with the target protein. In contrast, for longer polyPRs, only certain parts of the chain are in contact with the transport components, while other regions make fewer or no contacts. This is explained in lines 188-195.

      16) In S2 and S3, does the data require an error bar?

      NCPR, defined as total charge divided by sequence length of the transport components, is a constant and therefore figure S3 does not require an error bar.

      In figure S3 we have added error bars (standard deviation) for the dipole moment calculated from 2.5 us simulations of the isolated transport components.

      17) What is the physiological significance when the salt concentration is 100 mM?

      We conducted simulations at two different salt concentrations: 200 mM, which aligns with in vitro conditions as reported in Hutten et al. [1], and a lower 100 mM salt concentration. The inclusion of the 100 mM salt concentration enables us to assess the significance of salt concentration, and to confirm the dominance of electrostatic interactions in polyPR binding. We also note that this range of salt concentration is commonly used in in-vitro experiments [1, 4, 5].

      18) Please introduce abbreviation NLS in the abstract.

      We added the full name of NLS to the abstract.

      19) Given the high number of Arg residues in its sequence, polyPR should interact with many proteins. It would be beneficial to discuss the frequency of binding/non-binding interactions of polyPR with nuclear transport components in comparison to general proteins.

      We appreciate the reviewer's comment. While such a comparison is indeed interesting, our study primarily focused on elucidating the interactions between polyPR and crucial nuclear transport components, aiming to provide insights into potential defects in nucleocytoplasmic transport. The broader comparison of polyPR interactions with different protein classes in the proteome is indeed an interesting direction for future research, but out of the scope of the current manuscript.

      20) The authors should provide a convergence check to determine whether the 2.5 µs simulations are sufficient for sampling the interaction modes, particularly with the long PR50.

      We have included a new figure (figure S5) and additional text in the Methods section to verify that extending the simulation duration does not alter the contact probabilities (which are indicators of binding modes) presented in figure 3a, confirming convergence of our computations.

      21) In reference to Figure 4, the upper panel merely summarizes the known transport mechanisms, while the lower part (A-H) provides potential novel insights from this study. Unfortunately, these novel insights are not sufficiently detailed. It is recommended to include more details to make these relevant plots clearer by expanding the corresponding discussions (currently, only the last paragraph in the Results section addresses these). If possible, the authors should also carry out some CG simulations of the most relevant processes to further elucidate the interference caused by polyPR.

      We have taken the reviewer's feedback into consideration and made the suggested revisions. Specifically, we have expanded the last paragraph of the discussion to provide more detailed explanations of the insights derived from our computational model. For each mechanism, we begin by presenting the reader with the baseline understanding of normal function of the transport component. Subsequently, we discuss how the findings presented in figures 2 and 3 offer insights into polyPR's potential interference with the function of NCT components. Furthermore, we have made improvements to the schematic representation of mechanisms in figure 4 to enhance clarity.

      At the moment, accurately capturing the binding of NCT components to their native binding targets and the competition with polyPR are best resolved by all-atom molecular dynamics simulations, which come with significant computational demands. This level of detail and computation-intensive analyses is beyond the scope of the current study, but we hope that our results will provide the groundwork for future, more detailed investigations.

      References

      1. Hutten, S., et al., Nuclear Import Receptors Directly Bind to Arginine-Rich Dipeptide Repeat Proteins and Suppress Their Pathological Interactions. Cell Rep., 2020. 33(12): p. 108538.

      2. Jafarinia, H., E. Van der Giessen, and P.R. Onck, Molecular basis of C9orf72 poly-PR interference with the β-karyopherin family of nuclear transport receptors. Sci. Rep., 2022. 12(1): p. 21324.

      3. Nanaura, H., et al., C9orf72-derived arginine-rich poly-dipeptides impede phase modifiers. Nat Commun, 2021. 12(1): p. 5301.

      4. Brady, J.P., et al., Structural and hydrodynamic properties of an intrinsically disordered region of a germ cell-specific protein on phase separation. Proceedings of the National Academy of Sciences, 2017. 114(39): p. E8194-E8203.

      5. Fisher, R.S. and S. Elbaum-Garfinkle, Tunable multiphase dynamics of arginine and lysine liquid condensates. Nat. Commun., 2020. 11(1): p. 4628.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1. Experiments regarding the inducible expression of MukBEF: The authors should provide western blots or rt-qPCR for MukBEF expression at 40 min and 2H.

      We provide now a western blot of MukB in non-induced and induced conditions as Figure 1-figure supplement 1D.

      1. Experiments with RiTer and LiTer constructs:<br /> a. Authors compare the mukB deletion against wild type (Fig. 2C). It would be additionally informative if these comparisons are made for matP deletion and wild type as well. This will strengthen the conclusion that long-range interactions in ter do increase in the absence of matP.

      We agree that the matP mutant may help the reader to compare the effect of the translocation in different backgrounds and have added it to the figure. This strengthens the conclusion that longrange interactions in ter do increase in the absence of matP in a rearranged chromosome, as observed in the WT configuration (Lioy et al., 2018).

      b. Additionally, in Fig. 2C, it appears that there is some decrease in long-range interactions in the absence of mukB in ter1 (Riter). Is this a significant change?

      The change observed is not significant. The results shown in Fig. 2C have been obtained using a 3C approach, which generated slightly more variability than Hi-C. Furthermore, we measured the range of contacts for the segment corresponding to Ter1 in RiTer (matS12-matS28), in different genetic contexts and different configurations. The results show that this level of variation is not significant (see graph below reporting two independent experiments).

      Author response image 1.

      Range of interactions measured on the interval matS12-matS18 in different genetic contexts and different configurations (MG1655 WT(1 and 2), ∆mukB, RiTer, RiTer ∆mukB).

      1. Experiments with various matS organizations: These experiments are interesting and an important part of the paper. However, it is rather hard to visualize the chromosome conformations in the strains after transposition. To aid the reader (particularly with panel E), authors can provide schematics of the chromosome conformations and anticipated/ observed chromosomal interactions. Circular interaction plots would be useful here.

      We thank the reviewer for this interesting remark; we have tried in the past to represent these interactions using a circular representation (see for example the web site of Ivan Junier; https://treetimc.github.io/circhic/index.html). However, this representation is not trivial to apprehend for nonspecialists, especially in strains with a rearranged chromosome configuration. Nonetheless, we have added graphical circular representations of the chromosome configurations to help the reader.

      1. ChIP experiments:<br /> a. This section of the manuscript needs to be further strengthened. It is not clear whether the ChIP signal observed is significant (for example at T10 or T20 min, the peak value does not appear to go above 1.1 fold. Can the authors be sure that this small increase is not simply a consequence of increase in copy number of the loci around the origin, as replication has initiated?

      The basal value of the ChIP on the non-replicated sequences (between 0-3.5 Mb for 10 minutes and 0-3 Mb for 20 minutes) is 0.8 and 0.7, respectively, whereas the mean value of the replicated sequence is 1.6 and 1.45. So the enrichment observed for these two points is about 2-fold, not 1.1 and it is 4 fold for t40min. These values were obtained by dividing the number of normalized reads in the ChIP (the number of reads at each position divided by the total number of reads) by the normalized reads of the input. Therefore, the increase in copy number is considered in the calculation. Furthermore, we added a supplementary figure (Figure Sup9) in which we performed a ChIP without tags on synchronized cells, and in this case, we did not observe any enrichment triggered by replication.

      b. Authors make a conclusion that MukB loads behind the replication fork. However, the time resolution of the presented experiments is not sufficient to be certain of this. Authors would need to perform more time-resolved experiments for the same.

      Reviewer 1 is correct; we attempted to discriminate whether the observed enrichment is (i) associated with the replication fork since we observed a decrease in the center of the enrichment at oriC as the maximum enrichment moves away with the replication fork after 20 and 40 minutes, or (ii) associated with the newly replicated sequence. To investigate this, we attempted to induce a single round of replication by shifting the cells back to 40°C after 10 minutes at 30°C. Unfortunately, replication initiation is not immediately halted by shifting the cells to 40°C, and we were unable to induce a single round of replication. To clarify our conclusions, we modified our manuscript to

      “Altogether, these findings indicate that MukBEF is loaded into regions newly replicated either at the replication fork or even further behind it, except in the Ter region from which it would be excluded.”

      c. Authors conclude that in the LiTer7 strain, MukB signal is absent from Ter2. However, when compared with the ChIP profiles by eye across panels in A and B, this does not seem to be significant. In the same results sections, authors state that there is a 3-fold increase in MukB signal in other regions. The corresponding graph does not show the same.

      Rather than relying solely on the enrichment levels, which can be challenging to compare across different strains due to slight variations in replication levels, we believe there is a clear disruption in this profile that corresponds to the Ter2 sequence. Furthermore, this discontinuity in enrichment relative to the replication profile is also observable in the WT configuration. At T40min, MukB ChIPseq signals halt at the Ter boundary, even though Ter is actively undergoing replication, as evidenced by observations in the input data.

      Regarding the fold increase of MukB, Reviewer 1 is correct; we overestimated this enrichment in the text and have now corrected it.

      d. Authors should provide western blot of MukB-Flag.

      We have added Supplementary Figure 1 D, which contains a Western blot of MukB-Flag.

      1. The bioinformatic analysis of matS site distribution is interesting, but this is not followed upon. The figure (Fig 5) is better suited in the supplement and used only as a discussion point.

      We acknowledge the reviewer's point, but we used this section to attempt to extend our findings to other bacteria and emphasize the observation that even though a few matS sites are necessary to inhibit MukBEF, the Ter domains are large and centered on dif even in other bacteria.

      1. The discussion section is lacking many references and key papers have not been cited (paragraph 1 of discussion for example has no references).

      The possibility that SMC-ScpAB and MukBEF can act independent of replication has been suggested previously, but are not cited or discussed. Similarly, there is some evidence for SMC-ScpAB association with newly replicated DNA (PMID 21923769).

      We have added references to the suggested paragraph and highlighted the fact that MukBEF's activity independent of replication was already known. However, we believe that the situation is less clear for SMC-ScpAB in B. subtilis or C. crescentus. In a similar manner, we found no clear evidence that SMCScpAB is associated with newly replicated DNA in the referenced studies.

      To clarify and enrich the discussion section, we have added a paragraph that provides perspective on the loading mechanisms of SMC-ScpAB and MukBEF.

      1. There are minor typographical errors that should be corrected. Some are highlighted here:

      a. Abstract: L5: "preferentially 'on' instead of 'in'"

      b. Introduction: Para 1 L8: "features that determine"

      c. Introduction: Para 2 L1: please check the phrasing of this line

      d. Results section 2: L1: Ter "MD" needs to be explained

      e. Page 8: Para 2: L6: "shows that 'a'"

      g. Page 13: Para 2: "MukBEF activity...". This sentence needs to be fixed.

      i. Figure 4: "input" instead of "imput"

      We thank Reviewer 1 for pointing out all these grammatical or spelling mistakes. We have corrected them all.

      f. Page 12: Para 2: "Xer" instead of "XDS"? *We added a reference to clarify the term.

      h. Methods: ChIP analysis: Authors state "MatP peaks", however, reported data is for MukB

      This description pertains to the matP peak detection shown in Supplementary Figure 3. We have incorporated this clarification into the text.

      j. Supplementary figure legends need to be provided (currently main figure legends appear to be pasted twice)

      Supplementary figure legends are provided at the end of the manuscript, and we have edited the manuscript to remove one copy of the figure legends.

      k. Authors should ensure sequencing data are deposited in an appropriate online repository and an accession number is provided.

      We waited for the appropriate timing in the editing process to upload our data, which we have now done. Additionally, we have added a data availability section to the manuscript, including sequence references on the NCBI.

      Reviewer #2 (Recommendations For The Authors):

      The authors largely avoid speculation on what might be the physiological relevance of the exclusion of MukBEF (and Smc-ScpAB) from the replication termination region (and the coordination with DNA replication). At this stage it would be helpful to present possible scenarios even if not yet supported by data. The authors should for example consider the following scenario: loop extrusion of a dif site in a chromosome dimer followed by dimer resolution by dif recombination leads to two chromosomes that are linked together by MukBEF (equivalent to cohesin holding sister chromatids together in eukaryotes but without a separase). This configuration (while rare) will hamper chromosome segregation. Is MatP particularly important under conditions of elevated levels of chromosome dimers? Could this even be experimentally tested? Other scenarios might also be entertained.

      Even though we prefer to avoid speculations, we agree that we may attempt to propose some hypotheses to the reader. To do so, we have added a few sentences at the end of our discussion. “We may speculate, based on in vitro observations (Kumar et al., 2022), that MukBEF could interfere with TopIV activity and delay potential chromosome decatenation. Another possibility is that chromosome dimers resolved at the dif site may become trapped in loops formed by MukBEF, thus delaying segregation. But none of these possible scenarios are supported by data yet, and a major challenge for the future is to determine whether and how MukBEF may interfere with one or both of these processes.”

      The manuscript text is well written. However, the labeling of strains in figures and text is sometimes inconsistent which can be confusing (LiTer Liter liter; e.g Riter Fig 2C). For consistency, always denote the number of matS sites in LiTer strains and also in the RiTer strain. The scheme denoting LiTer and RiTer strains should indicate the orientation of DNA segments so it is clear that the engineering does not involve inversion (correct?). Similarly: Use uniform labelling for time points: see T40mn vs 40mn vs T2H vs 2H

      We have reviewed the manuscript to standardize our labeling. Additionally, we have included a schema in Figure 2, indicating the matS numbers at the Ter border to emphasize that the transposition events do not involve inversion.

      matS sites do not have identical sequences and bind different levels of MatP (suppl fig 3). Does this possibly affect the interpretation of some of the findings (when altering few or only a single matS site). Maybe a comment on this possibility can be added.

      We agree with the referee; we do not want to conclude too strongly about the impact of matS density, so we have added this sentence at the end of the section titled 'matS Determinants to Prevent MukBEF Activity':

      “Altogether, assuming that differences in the matS sequences do not modify MatP's ability to bind to the chromosome and affect its capacity to inhibit MukBEF, these results suggested that the density of matS sites in a small chromosomal region has a greater impact than dispersion of the same number of matS sites over a larger segment”

      Figure 5: show selected examples of matS site distribution in addition to the averaged distribution (as in supplemental figure)?

      Figure 5 shows the median of the matS distribution based on the matS positions of 16 species as displayed in the supplementary figure. We believe that this figure is interesting as it represents the overall matS distribution across the Enterobacterales, Pasteurellales, and Vibrionales.

      How do authors define 'background levels' (page 9)in their ChIP-Seq experiments? Please add a definition or reword.

      We agree that the term 'background level' here could be confusing, so we have modified it to 'basal level' to refer to the non-replicating sequence. The background level can be observed in Supplementary Figure 9 in the ChIP without tags, and, on average, the background level is 1 throughout the entire chromosome in these control experiments.

      This reviewer would naively expect the normalized ChIP-Seq signals to revolve around a ratio of 1 (Fig. 4)? They do in one panel (Figure 4B) but not in the others (Figure 4A). Please provide an explanation.

      We thank the referee for this pertinent observation. An error was made during the smoothing of the data in Figure 4A, which resulted in an underestimation of the input values. This mistake does not alter the profile of the ChIP (it's a division by a constant) and our conclusions. We provide a revised version of the figure.

      Inconsistent axis labelling: e.g Figure 4

      Enterobacterals should be Enterobacterales (?)

      KB should be kb

      MB should be Mb

      Imput should be Input

      FlaG should be Flag

      We have made the suggested modifications to the text.

      'These results unveiled that fluorescent MukBEF foci previously observed associated with the Ori region were probably not bound to DNA' Isn't the alternative scenario that MukBEF bound to distant DNA segments colocalize an equally likely scenario? Please rephrase.

      Since we lack evidence regarding what triggers the formation of a unique MukB focus associated with the origin and what this focus could represent, we have removed this sentence.

      Reviewer #3 (Recommendations For The Authors):

      The text is well-written and easy to follow, but I would suggest several improvements to make things clearer:

      1. Many plots are missing labels or legends. (I) All contact plots such as Fig. 1C should have a color legend. It is not clear how large the signal is and whether the plots are on the same scale. (II)<br /> Ratiometric contact plots such as in Fig. 1D should indicate what values are shown. Is this a log ratio?

      As indicated in the materials and methods section, the ratio presented on this manuscript was calculated for each point on the map by dividing the number of contacts in one condition by the number of contacts in the other condition. The Log2 of the ratio was then plotted using a Gaussian filter.

      1. Genotypes and strain names are often inconsistent. Sometimes ΔmukB, ΔmatP, ΔmatS is used, other times it is just mukB, matP, matS; There are various permutations of LiTer, Liter, liter etc.

      These inconsistencies have been corrected.

      1. The time notation is unconventional. I recommend using 0 min, 40 min, 120 min etc. instead of T0, T40mn, T2H.

      As requested, we have standardized and used conventional annotations.

      1. A supplemental strain table listing detailed genotypes would be helpful.

      A strain table has been added, along with a second table recapitulating the positions of matS in the different strains.

      1. Fig. 1A: Move the IPTG labels to the top? It took me a while to spot them.

      We have moved the labels to the top of the figure and increased the font size to make them more visible.

      1. Fig 1C: Have these plots been contrast adjusted? If so, this should be indicated. The background looks very white and the transitions from diagonal to background look quite sharp.

      No, these matrices haven't been contrast-adjusted. They were created in MATLAB, then exported as TIFF files and directly incorporated into the figure. Nevertheless, we noticed that the color code of the matrix in Figure 3 was different and subsequently adjusted it to achieve uniformity across all matrices.

      7, Fig 1C: What is the region around 3 Mb and 4 Mb? It looks like the contacts there are somewhat MukBEF-independent.

      The referee is right. In the presence of the plasmid pPSV38 (carrying the MukBEF operon or not), we repeatedly observed an increase of long range contacts around 3 Mb. The origin of these contacts is unknown.

      1. Fig 1D: Have the log ratios been clipped at -1 and 1 or was some smoothing filter applied? I would expect the division of small and noisy numbers in the background region to produce many extreme values. This does not appear to be the case.

      The referee is right, dividing two matrices generates a ratio with extreme values. To avoid this, the Log2 of the ratio is plotted with a Gaussian filter, as described before (Lioy et al., 2018).

      1. Fig 1E: I recommend including a wild-type reference trace as a point of reference.

      We have added the WT profile to the figure.

      1. Fig 2: I feel the side-by-side cartoon from Supplemental Fig. 2A could be included in the main figure to make things easier to grasp.

      We added a schematic representation of the chromosome configuration on top of the matrices to aid understanding.

      1. Fig. 2C: One could put both plots on the same y-axis scale to make them comparable.

      We have modified the axes as required.

      1. Fig. 3C: The LiTer4 ratio plot has two blue bands in the 3-4.5 Mb region. I was wondering what they might be. These long-range contacts seem to be transposition-dependent and suppressed by MatP, is that correct?

      The referee is right. This indicates that in the absence of MatP, one part of the Ter was able to interact with a distal region of the chromosome, albeit with a low frequency. The origin is not yet known.

      1. Fig. 3E: It is hard to understand what is a strain label and what is the analyzed region of interest. The plot heading and figure legend say Ter2 (but then, there are different Ter2 variants), some labels say Ter, others say Ter2, sometimes it doesn't say anything, some labels say ΔmatS or ΔmatP, others say matS or matP, and so on.

      We have unified our notation and add more description on the legend to clarify this figure :

      “Ter” corresponds to the range of contacts over the entire Ter region, in the WT strain (WT Ter) or in the ΔmatP strain (ΔmatP Ter). The column WT matSX-Y corresponds to the range of contacts between the designated matS sites in the WT configuration. This portion of the Ter can be compared with the same Ter segment in the transposed strain (Ter2). Additionally, the matS20-28 segment corresponds to Ter2 in LiTer9, just as matS22-28 corresponds to Ter2 in LiTer7, and matS25-28 to Ter2 in LiTer4. The range of contacts of this segment was also measured in a ΔmatP or ΔmatS background.”

      1. Fig. 4 and p.9: "Normalized ChIP-seq experiments were performed by normalizing the quantity of immuno-precipitated fragments to the input of MukB-Flag and then divide by the normalized ChIP signals at t0 to measure the enrichment trigger by replication."

      This statement and the ChIP plots in Fig. 4A are somewhat puzzling. If the data were divided by the ChIP signal at t0, as stated in the text, then I would expect the first plot (t0) to be a flat line at value 1. This is not the case. I assume that normalized ChIP is shown without the division by t0, as stated in the figure legend.

      The referee is right. This sentence has been corrected, and as described in the Methods section, Figure 4 shows the ChIP normalized by the input.

      If that's true and the numbers were obtained by dividing read-count adjusted immunoprecipitate by read-count adjusted input, then I would expect an average value of 1. This is also not the case. Why are the numbers so low? I think this needs some more details on how the data was prepared.

      The referee is right; we thank him for this remark. Our data are processed using the following method: the value of each read is divided by the total number of reads. A sliding window of 50 kb is applied to these normalized values to smooth the data. Then, the resulting signal from the ChIP is divided by the resulting signal from the input. This is what is shown in Figure 4. Unfortunately, for some of our results, the sliding window was not correctly applied to the input data. This did not alter the ChIP profile but did affect the absolute values. We have resolved this issue and corrected the figure.

      Another potential issue is that it's not clear what the background signal is and whether it is evenly distributed. The effect size is rather small. Negative controls (untagged MukB for each timepoint) would help to estimate the background distribution, and calibrator DNA could be used to estimate the signal-to-background ratio. There is the danger that the apparent enrichment of replicated DNA is due to increased "stickiness" rather than increased MukBEF binding. If any controls are available, I would strongly suggest to show them.

      To address this remark, a ChIP experiment with a non-tagged strain under comparable synchronization conditions has been performed. The results are presented as Supplementary Figure 9; they reveal that the enrichment shown in Figure 4 is not attributed to nonspecific antibody binding or 'stickiness’.

      1. Fig. 4A, B: The y-axes on the right are unlabeled and the figure legends mention immunoblot analysis, which is not shown.

      We labeled the y-axes as 'anti-Flag ChIP/input' and made corrections to the figure legend.

      1. Fig. 4B: This figure shows a dip in enrichment at the Ter2 region of LiTer7, which supports the authors' case. Having a side-by-side comparison with WT at 60 min would be good, as this time point is not shown in Fig. 4A.

      Cell synchronization can be somewhat challenging, and we have observed that the timing of replication restart can vary depending on the genetic background of the cells. This delay is evident in the case of LiTer7. To address this, we compared LiTer7 after 60 minutes to the wild type strain (WT) after 40 minutes of replication. Even though the duration of replication is 20 minutes longer in LiTer7, the replication profiles of these two strains under these two different conditions (40 minutes and 60 minutes) are comparable and provide a better representation of similar replication progression.

      1. Fig. 4C: Highlighting the position of the replication origin would help to interpret the data.

      We highlight oriC position with a red dash line

      1. Fig. 4C: One could include a range-of-contact plot that compares the three conditions (similar to Fig. 1E).

      We have added this quantification to Supplemental Figure 8

      1. Supplemental Fig. 2A: In the LiTer15 cartoon, the flanking attachment sites do not line up. Is this correct? I would also recommend indicating the direction of the Ter1 and Ter2 regions before and after recombination.

      In this configuration, attB and attR, as well as attL and attB', should be aligned but the remaining attR attL may not. We have corrected this misalignment. To clarify the question of sequence orientation, we have included in the figure legend that all transposed sequences maintain their original orientation.

      1. Supplemental Fig. 3: One could show where the deleted matS sites are.

      We added red asterisks to the ChIP representation to highlight the positions of the missing matS.

      1. Supplemental Fig. 3B: The plot legend is inconsistent with panel A (What is "WT2")?

      We have corrected it.

      1. Supplemental Fig. 3C: The E-value notation is unusual. Is this 8.9 x 10^-61?

      The value is 8.9 x 10-61; we modified the annotation.

      23) Abstract: "While different features for the activity of the bacterial canonical SMC complex, SmcScpAB, have been described in different bacteria, not much is known about the way chromosomes in enterobacteria interact with their SMC complex, MukBEF."

      Could this be more specific? What features are addressed in this manuscript that have been described for Smc-ScpAB but not MukBEF? Alternatively, one could summarize what MukBEF does to capture the interest of readers unfamiliar with the topic.

      We modified these first sentences.

      1. p.5 "was cloned onto a medium-copy number plasmid under control of a lacI promoter" Is "lacI promoter" correct? My understanding is that the promoter of the lacI gene is constitutive, whereas the promoter of the downstream lac operon is regulated by LacI. I would recommend providing an annotated plasmid sequence in supplemental material to make things clearer.

      We modified it and replaced “ lacI promoter” with the correct annotation, pLac.

      1. p. 5 heading "MukBEF activity does not initiate at a single locus" and p. 6 "Altogether, the results indicate that the increase in contact does not originate from a specific position on the chromosome but rather appears from numerous sites". Although this conclusion is supported by the follow-up experiments, I felt it is perhaps a bit too strong at this point in the text. Perhaps MukBEF loads slowly at a single site, but then moves away quickly? Would that not also lead to a flat increase in the contact plots? One could consider softening these statements (at least in the section header), and then be more confident later on.

      We used 'indicate' and 'suggesting' at the end of this results section, and we feel that we have not overreached in our conclusions at this point. While it's true that we can consider other hypotheses, we believe that, at this stage, our suggestion that MukBEF is loaded over the entire chromosome is the simplest and more likely explanation.

      1. p.7: "[these results] also reveal that MukBEF does not translocate from the Ori region to the terminus of the chromosome as observed with Smc-ScpAB in different bacteria."

      This isn't strictly true for single molecules, is it? Some molecules might translocate from Ori to Ter. Perhaps clarify that this is about the bulk flux of MukBEF?

      At this point, our conclusion that MukBEF does not travel from the ori to Ter is global and refers to the results described in this section. However, the referee is correct in pointing out that we cannot exclude the possibility that in a WT configuration (without a Ter in the middle of the right replicore), a specific MukBEF complex can be loaded near Ori and travel all along the chromosome until the Ter. To clarify our statement, we have revised it to 'reveal that MukBEF does not globally translocate from the Ori region to the terminus of the chromosome.' This change is intended to highlight the fact that we are drawing a general conclusion about the behavior of MukBEF and to facilitate its comparison with Smc-ScpAB in B. subtilis.

      1. p. 10: The section title "Long-range contacts correlate with MukBEF binding" and the concluding sentence "Altogether, these results indicate that MukBEF promotes long-range DNA contacts independently of the replication process even though it binds preferentially in newly replicated regions" seem to contradict each other. I would rephrase the title as "MukBEF promotes long-range contacts in the absence of replication" or similar.

      We agree with this suggestion and have used the proposed title.

      1. p. 13: I recommend reserving the name "condensin" for the eukaryotic condensin complex and using "MukBEF" throughout.

      We used MukBEF throughout.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al. provide valuable data for understanding molecular features of the human spinal cord. The authors made considerable efforts to acknowledge and objectively address the limitations of Visium while attempting to overcome them by utilizing single-nucleus RNA sequencing (snRNA-seq) from the same tissue. By mapping snRNA-seq clusters to Visium data, they offer spatial information, complemented by RNA-ISH and immunofluorescence (IF) validation. They also discuss gender-related differences and the similarities between human and mouse data, aiming to establish a crucial foundation for experimental research. However, I have some comments below.

      1) The observation of gender-related differences is interesting. The authors reported that SCN10A, associated with nociceptos, exhibited stronger expression in females. While they intend to validate this finding through IF, the quantitative difference is not clearly observed in the IF data (Figure 5f). It would be essential to provide validation through DAPI-based cell counts, demonstrating the difference in CHAT/SCNA10A co-expression.

      Thank you for this important question! We have added panel G in Figure 5, which provided the quantitative analysis of the percentage of CHAT neurons that expressing SCN10A in male and female spinal cord.

      2) It is meritorious that in novel features of the transcriptomic study, the authors considered gender-related differences and similarities between humans and mice. Nevertheless, despite the extensive bioinformatics-based analyses performed, the results mostly confirm what has been previously reported (Nguyen et al. 2021; Yadav et al. 2023; Jung et al. 2023).

      Thank you! In addition to confirming the findings from previous studies, our results also provided new information regarding the difference between human and mouse. For example, we found that PVALB and SST showed broader expression across human DRG neuronal clusters than in mice, suggesting that genes are more selectively expressed in mice than in human DRGs. Moreover, we identified several genes associated with pain that were differentially expressed in motor neurons between sexes.

      3) The study did not perform snRNA-seq in the DRG. The limitations of Visium in cell type separation are acknowledged, and the authors are aware that Visium alone has limitations in describing cell expression patterns. The authors need to validate their findings via analyses of public DRG snRNA-seq data (Jung et al. 2023 Ncom; Nguyen et al. 2021eLife) before drawing broad conclusions.

      Thank you for this critical question! It is right that snRNA-seq has a higher resolution in describing cell expression patterns compared to the spatial transcriptomics. We acknowledged the limitation that we only performed spatial transcriptomics in human DRG without snRNA-seq. Nevertheless, our results of spatial transcriptomics in human DRG were similar to previously public snRNA-seq data of human DRG, suggesting a feasibility of using spatial transcriptomics in human DRG.

      4) Figure 7's comparison between human Visium spot data and Renthal et al.'s mouse snRNA-seq may have limitations as Visium spot data could not provide a transcriptional profile at the single cell resolution. The authors need to clarify this point.

      Thank you! We have clarified this in the limitation section.

      5) Recent findings indicate that type 2 cytokines can directly stimulate sensory neurons. This includes the expression of IL-4RA, IL31RA, and IL13RA in DRG. These findings support the role of JAK kinase inhibitors in mediating chronic itch. Demonstrating the expression of these itch receptors in DRG would be valuable.

      We have provided the expression patterns of IL-4RA, IL31RA, and IL13RA in human and mouse DRG (Figure 7-figure supplement 4), and cited the relevant paper.

      6) Given that juxtacrine and paracrine signals operate from 0 to 200 um, spatial information is vital to understanding intercellular communication. The presentation of spatial information using Visium is meaningful, and more comprehensive analyses of potential interaction based on distance should be provided, beyond the top 10 interactions (Figure 8).

      Thank you for this good question! In this study, we focused on the putative projections from DRG to spinal neuronal types, which may be an important future direction for research on sensory transduction. It will be interesting to determine the intercellular communication in the spinal spot using the spatial transcriptomics data in future studies.

      7) The gender-related differences are interesting and, if possible, it would be interesting to explore whether age-related differences or degeneration-related factors exist. Using public data could allow the examination of age-related changes.

      We agree with the reviewer that it is of great importance to identify the age-related differences using spatial transcriptomics and scRNA-seq data of human spinal cord. However, it is currently difficult to obtain comprehensive results due to the limited human spinal cord datasets regarding different ages.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors generated a comprehensive dataset of human spinal cord transcriptome using single-cell RNA sequencing and the Visium spatial transcriptomics platform. They employed Visium data to determine the spatial orientation of each cell type. Using single-cell RNA sequencing data, they identified differentially expressed genes by comparing human and mouse samples, as well as male and female samples.

      Strengths:

      This study offers a thorough exploration of both cellular and spatial heterogeneity within the human spinal cord. The resulting atlas datasets and analysis findings represent valuable resources for the neuroscience community.

      Weaknesses:

      The analysis of spatial transcriptomics data was conducted as it is single-cell RNAseq data. However, there are established tools for effectively integrating these two types of data. The incorporation of deconvolution methods could enhance the characterization of each spot's cell type composition.

      Thank you very much for your positive comments and suggestions!Indeed, we have used deconvolution methods to incorporate the spinal snRNA-seq and spatial transcriptomics data.

      Reviewer #3 (Public Review):

      Summary:

      Zhang et al sought to use spatial transcriptomics and single-nucleus RNA sequencing to classify human spinal cord neurons. The authors reported 17 clusters on 10x

      Visium slides (6 donors) and 21 clusters by single-nucleus sequencing (9 donors). The authors tried to compare the results to those reported in mice and claimed similar patterns with some differing genes.

      Strengths:

      The manuscript provides a valuable database for the molecular and cellular organization of adult human spinal cords in addition to published datasets (Andersen, et al. 2023; Yadav, et al. 2023).

      Weaknesses:

      The results are largely observatory and lack quantitative analysis. Moreover, the assertions regarding the sex differences in motor neurons and the potential interactions between DRG and spinal cord neuronal subclusters appear preliminary and necessitate more rigorous validation.

      Thank you very much! We have provided the quantitative analysis of the differential expression of SCN10A in male and female spinal cord motor neurons. Our sequencing data revealed putative projections from DRG to spinal neuronal types, which may be an important future direction for research on sensory transduction. We did not use animal models to verify these interactions between DRG and spinal cord neuronal subclusters, which is a major limitation in our study. Nevertheless, our analysis results will provide an important resource for future research to investigate the molecular mechanism underlying spinal cord physiology and diseases.

    1. Author Response

      The following is the authors’ response to the current reviews.

      I greatly appreciate your time and attention on our manuscript. I have carefully considered the reviewers’ comments and made modifications. Below are my responses to each comment and the revisions I have made.

      Reviewer #2 (Recommendations for The Authors):

      1) The authors address well with most of my concerns. I am fine with most of the responses except question 8. Actin is also reported to be located in nuclear (PMID: 31481797). It would be better to utlize other markers, like GAPDH. Moreover, the author did not address the issue of LXRa. I strongly suggest that the authors repeat this experiment to get a more solid result.

      Thank you for the comment! Actin is frequently used as a negative control for nucleus protein in many publications, such as DOI:10.1038/s41419-018-0428-x. Beta-actin is rich in cytoplasm protein that it only takes few seconds to reveal the strong band when performing western blot with cytoplasm. However, actin does not reveal when exposing western- blot with nucleus for minutes in many studies, including in this study. Even though as mentioned actin is also located in the nuclear, such a tiny amount in the nucleus may not be revealed in western blot with exposure in seconds. However, if nucleus protein is contaminated with total cell lysate, the action is quite easy to reveal. As a result, the use of actin as the nagtive control of nucleus protein is well-accepted.

      Author response image 1.

      2) In addition, the authors mentioned IL-1b but present IL-6 in the figure of Figure. 2F. Please correct.

      We appreciate your attention on the detail. “IL-1b” is corrected to “IL-6”.


      The following is the authors’ response to the original reviews.

      I greatly appreciate the time you and the reviewers have taken to review my paper and provide detailed feedback and suggestions. I have carefully considered the reviewers’ comments and made thorough modifications to the paper. Below are my responses to each comment and the revisions I have made.

      Reviewer #1 (Recommendations for The Authors):

      Although the paper has strengths in understanding better the pathway of activation leading to polarization, the mechanisms contributing to cytokine storm are weak. In the context of cellular in vitro changes, it would be very interesting to map these molecular changes to strengthen the pathways affected in this model. In vivo, stronger evidence is required to bridge the gap between the in vitro model and mechanisms regulating in vivo disease development. Reporting of experiments needs to be considerably strengthened. Individual data points are shown, however, it is unclear whether these represent biological or technical, or how many experiments have been undertaken. The addition of this information is essential for uznderstanding the robustness and repeatability of findings. Currently, these cannot be assessed from the information provided. Furthermore, it is unclear whether the error bars represent s.e.m or s.d. which greatly impacts data interpretation.

      Answer: thank you for the valuable comments! We have added some in vivo experiments to strengthen the bridge between the in vitro and in vivo model. 1) The depletion of macrophage by clodronate-liposomes (CLL) i.v. injection was performed in endotoxemic mice with leucine. The alleviation of LPS-induced cytokine production by leucine was muted with macrophage depletion (Figure 2E, F), suggesting the anti-inflammatory effect of leucine was exerted via the regulation of macrophage. 2) The LXRα inhibitor, GSK2033, was applied to mice via i.v. injection prior to LPS-challenge. In GSK2033 treated mice, the effects of leucine on the serum levels of inflammatory cytokines were neutralized (Supplementary Figure 4), partially indicating the importance of LXRα in the regulation of cytokine release. We acknowledge the limitation of LXRα inhibition by GSK2033 in this study. In our future study, we plan to use monocyte specific LXRα knockout mice by LysM-cre to elucidate the importance of LXRα in the progression of CSS, and specifically focuse on the molecular mechanism how mTORC1 interacts with LXRα to modulate M2 macrophage polarization. Additionally, we made modifications in the manuscript to clarify that the error bars represented as the standard error of the mean (SEM) (line 416).

      Reviewer #2 (Recommendations for The Authors):

      1. The whole manuscript is based on the 2% leucine from feed and 5% leucine from water. Is there any rationale for using these two types of different concentrations in this study? Often, a dose-dependent treatment is utilized in vivo in pharmacological study. Therefore, the authors should at least test two different concentrations in each type to confirm the conclusion.

      Answer: thank you for your comment and suggestion. The 2% leucine in feed and 5% leucine in water in this study were based on the literatures. In those studies, leucine was reported to activate mTORC1 and regulate metabolism at such types of different concentration as shown below, although there is lack of leucine in the regulation of macrophage activation. In this study, we found leucine supplementation in such types significantly increased the average body weight gain of mice, suggesting growth promoting and no toxicity of leucine on mice.

      (1) Jiang X, Zhang Y, Hu W, Liang Y, Zheng L, Zheng J, Wang B, Guo X. 2021. Different Effects of Leucine Supplementation and/or Exercise on Systemic Insulin Sensitivity in Mice. Front Endocrinol (Lausanne) 12:651303. doi:10.3389/fendo.2021.651303

      (2) Holler M, Grottke A, Mueck K, Manes J, Jücker M, Rodemann HP, Toulany M. 2016. Dual Targeting of Akt and mTORC1 Impairs Repair of DNA Double-Strand Breaks and Increases Radiation Sensitivity of Human Tumor Cells. PLoS One 11: e0154745. doi:10.1371/ journal. pone.0154745

      1. The authors focus on macrophage polarization as the major cellular event affected by leucine treatment; however, they also report that the proportion of multiple immune cell types has been suppressed by leucine treatment. As some of these immune cells can also produce inflammatory cytokines, the authors should confirm the anti-inflammatory effects of leucine were mainly mediated by modulating macrophage polarization as they suggested in the manuscript. For example, the authors could utilize Anti-CSF1 or clodronate to deplete macrophage and observed whether leucine-reduced inflammatory cytokines production was largely diminished.

      Answer: thank you for your valuable suggestion! We used clodronate-liposome (CLL) i.v. injection to deplete macrophages to further validate the specific contribution of macrophage polarization to the anti-inflammatory effects of leucine. The results revealed that clodronate treatment decreased blood monocyte counts and eliminated the effect of leucine in lowering serum inflammatory factors IL-6, IFN-γ and TNF-α (Figure 2E-F), suggesting the importance of leucine-mediacted macrophage activation on the anti-inflammation.

      1. It would be important to examine whether 10 mM leucine would exhibit cytotoxicity to bone marrow derived monocytes/macrophages. This would confirm that leucine treatment directly suppresses inflammatory cytokines production or reduces cell viability to indirectly modulates inflammatory responses.

      Answer: thank you for your valuable suggestion! We performed cell viability assays after treating BMDM with 2 mM and 10 mM leucine for 6h or 24h (consistent with the timing of leucine treatment in article). The results showed that at 6h, 2 mM leucine significantly increased cell viability, while 10 mM leucine had no significant effect on cell viability. At 24h, both 2 mM and 10 mM leucine significantly increased cell viability. In conclusion, 2 mM and 10 mM leucine were not cytotoxic to BMDM, and the anti-inflammatory effect of leucine was not derived from the reduction in cell viability (Supplementary Figure 2).

      1. The authors found that leucine promotes mTORC1-LXRα for arginase-1 transcription and M2 polarization. The pathway the authors elucidated is not surprising, which has already been reported in other studies. What about the other M2 markers? The authors could examine whether arginiase-1 deficiency would deplete leucine-increased other M2 marker genes expression. Moreover, what about the molecular mechanism for leucine-reduced M1 polarization?

      Answer: Thank you for the valuable comments! To clarify that Arginase-1 activity, mRNA expression of Fizz1, Mgl1, Mgl2, and Ym1 were well established markers for M2 macrophage. Specifically, Arginase-1 activity is important to define M2 functionality. These markers were used to define the level of M2 macrophage polarization. Only a few studies indicated the involvement of mTORC1 in the M2 polarization as shown below; however, there is no molecular mechanism about how mTORC1 modulates this process. In this study, we provide the evidence that LXRα mediated the mTORC1 associated M2 polarization, and leucine regulated mTORC1-LXRα to promote M2 polarization, which was in dependent of IL-4-induced STAT6 signaling. In our future study, we are focusing on the molecular mechanism how mTORC1 interacts with LXRα to modulate M2 macrophage polarization.

      (1) Byles V, Covarrubias AJ, Ben-Sahra I, Lamming DW, Sabatini DM, Manning BD, Horng T. 2013. The TSC-mTOR pathway regulates macrophage polarization. Nat Commun 4:2834. doi:10.1038/ncomms3834

      (2) Kimura T, Nada S, Takegahara N, Okuno T, Nojima S, Kang S, Ito D, Morimoto K, Hosokawa T, Hayama Y, Mitsui Y, Sakurai N, Sarashina-Kida H, Nishide M, Maeda Y, Takamatsu H, Okuzaki D, Yamada M, Okada M, Kumanogoh A. 2016. Polarization of M2 macrophages requires Lamtor1 that integrates cytokine and amino-acid signals. Nat Commun 7:13130. doi:10.1038/ncomms13130

      1. In Fig. 1A, what's the P-value among these two groups? Moreover, what about the result with combination treatment as the authors performed in other panels?

      Answer: thank you for the valuable comments from the reviewer! In Figure 1A, the P-value between the LPS and LPS+2% Leucine groups is 0.0031, and the P-value between the LPS and LPS+5% Leucine groups is 0.0009. I have marked the significance in Figure 1A accordingly. Due to the limited number of mice, we only treated mice in two different ways respectively. Initially, we performed survival experiment and observed that the addition of leucine prolonged survive of mice at lethal dose. Based on these findings, we further investigated whether a combination of the two methods would yield better results on the regulation of inflammation, but the combination exhibited the similar effect on cytokines production, and it is not necessary to repeat the survival experiment with the combination.

      1. It seems not much difference could be observed between 2% leucine from feed and 5% leucine from water in the expression of inflammatory genes and anti-inflammation-related markers. However, it seems that 5% leucine from water would exhibit a better survival rate than 2% leucine from feed. The authors should explain potential reasons and at least examine it in vitro.

      Answer: we appreciate the valuable comments from the reviewer! There are two possible reasons: 1) When lethal dose of LPS applied, mice were too weak to eat but still drank a small amount of water; 2) the absorption of leucine from the water were much easier than from the feed, thus leucine from the water exhibited much better efficiency in a short period of survival experiment. On the other hand, the cytokine levels and expressions were measure in non-lethal experiments, in which mice were in much better condition for lecine absorption.

      1. In Fig. 4A, the authors examined the expression of p-mTOR. The authors should further examine the expression of p-AKT (S473, T308) and p-S6 to clarify whether mTORC1 or mTORC2 has been modulated. As reported, leucine should act on GATOR2 for mTORC1 activation. However, the authors reported that Torin, a mTORC1/mTORC2 inhibitor, inhibited M2 polarization more significantly compared to rapamycin, a mTORC1 inhibitor. These observations seem to indicate that leucine has other targets except mTORC1, such as mTORC2, which might raise novel mechanisms that have never been reported before.

      Answer: thank you for the valuable comments! Akt-mTORC1 signaling integrates metabolic inputs to control macrophage activation. Wortamannin inhibition of AKT was followed by inhibition of M2 polarization, suggesting that AKT signaling is involved in M2 polarization. Studies reported that mTORC1 activation inhibits pAkt (T308), inhibition of mTORC1 in turn activate Akt (1), promoting M2 polarization as a feed back to compensate the inhibition of mTORC1 induced suppression of M2 polarization. mTORC2, directly phosphrlate Akt at S473, and inhibition of mTORC2 inhibits p-Akt (S473) (2), further inhibiting M2 porlarization. Torin1 is the inhibitor for both, while rapamycin is specially for mTORC1 (3). The explanation was included in Line 252-262

      (1) Leontieva OV, Demidenko ZN, Blagosklonny MV. 2014. Rapamycin reverses insulin resistance (IR) in high-glucose medium without causing IR in normoglycemic medium. Cell Death Dis 5: e1214. doi:10.1038/cddis.2014. 178Byles.

      (2) Holler M, Grottke A, Mueck K, Manes J, Jücker M, Rodemann HP, Toulany M. 2016. Dual Targeting of Akt and mTORC1 Impairs Repair of DNA Double-Strand Breaks and Increases Radiation Sensitivity of Human Tumor Cells. PLoS One 11: e0154745. doi:10.1371/journal. pone .0154745

      (3) V, Covarrubias AJ, Ben-Sahra I, Lamming DW, Sabatini DM, Manning BD, Horng T. 2013. The TSC-mTOR pathway regulates macrophage polarization. Nat Commun 4:2834. doi:10.1038/ncomms3834.

      1. In Fig.5B, frankly speaking, I do not observe much difference in LXRα expression. Also, the actin band is too poor to get any conclusion.

      Answer: thank you for the valuable comments from the reviewer! In Fig. 5B, the extracted protein is specifically mentioned as nuclear protein in the text. It is stated that actin is expressed in the cytoplasm, while histone is expressed in the nucleus. The figure shows that actin expression is almost absent, which is mentioned to demonstrate the purity of the extracted nuclear protein.

      1. In Fig. 5C and 5D, it is amazing that GSK2033 would reduce urea production even largely greater than the basal condition (lane 1). As GSK2033 normalized IL-4 or IL-4 combination with Leucine raised urea production in cells, how GSK2033 could reduce urea in medium. The authors should explain this discrepancy.

      Answer: thank you for the valuable comments from the reviewer! In Fig. 5C, urea production was measured directly in the culture medium using a commercial assay kit, and GSK2033 indeed led to a significant decrease in urea production. In Fig. 5D, on the other hand, we assessed the activity of arginase-1 by lysing the cells, activating arginase-1, providing the substrate arginine, and then measuring urea production. In response to your question, the explanation is that in the assay measuring arginase-1 activity, we supplied a sufficient amount of substrate arginine, which may better reflect the enzyme’s activity and the results were consistent with our expectations. Additionally, when GSK2033 was used in combination with IL-4 or IL-4 plus leucine, it might interact with the IL-4 signaling pathway or leucine metabolism pathway, leading to an increase in urea production. This is just our preliminary explanation for the contradictory results, and we acknowledge that further research is needed to explore the mechanism of action of GSK2033 and its interactions with IL-4 or leucine.

      1. Line 98, "INF-gamma" should be IFN-gamma.

      Answer: We appreciate your attention to detail. We apologize for the error in line 98, where “INF-gamma” should indeed be corrected to “IFN-gamma (IFN-γ).” We will make the necessary correction in the revised version of the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents important findings for the field of Alzheimer's disease, especially for the electrophysiology subfield, by investigating the temporal evolution of different disease stages typically reported using M/EEG markers of resting-state brain activity. The evidence supporting the conclusions is solid and the methodology as well as the descriptions of the processes are of high quality, although a separation of individuals who are biomarker positive versus negative would have strengthened the interpretability of the results and the conclusions of the study.

      Response: Thank you for the positive assessment of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to infer the trajectories of long range and local neuronal synchrony across the Alzheimer's disease continuum, relative to neurodegeneration and cognitive decline. The trajectories are inferred using event-based models, which infer a set of data-driven disease stages from a given dataset. The authors develop an adapted event-based modelling approach, in which they characterise each stage as a particular biomarker increasing by a particular z-score deviation from controls. Fitting infers the optimal set of z-scores to use for each biomarker and the order in which each biomarker reaches each z-score. The authors apply this approach to data from 148 individuals (70 cognitively unimpaired older adults and 78 individual with mild cognitive impairment or Alzheimer's disease), identifying trajectories in which long-range (amplitude-envolope correlation) and local (regional spectral power) neuronal synchrony in the alpha and beta bands becomes abnormal prior to neurodegeneration (measured as the volume of the parahippocampal gyrus) and cognitive decline (measured using the mini-mental state examination).

      Strengths:

      • The main strength is that the authors assess two models. In the first they derive a staging system based only on the volume of the parahippocampal gyrus and mini-mental state examination score. They then investigate how neuronal synchrony metrics change compared to this staging system. In the second they derive a staging system that also includes an average (combined long-range and local) neuronal synchrony metric and investigate how long-range and local synchrony metrics change relative to this staging system. This is a strength as the first model provides confidence that there is not overfitting to the neuronal synchrony data, and the second provides more detailed insights into the dynamics of the early neuronal synchrony changes.

      • Another strength is that the authors automatically infer the optimal z-scores to choose, rather than having to pre-select them manually, as in previous approaches.

      Response: Thank you for the positive comments and a succinct summary of the paper and its strengths.

      Weaknesses:

      • The dataset is small and no external validation is performed.

      Response: We agree that future validation studies of the predictions are necessary. We now include the related sentences in the last paragraph of the limitations section in the revised manuscript.

      • A high proportion of the data is from controls (nearly 50%) with no biomarker evidence of Alzheimer's disease, and so the changes may be driven by aging or other non-Alzheimer's effects.

      Response: We would like to clarify that the z-scores of the metrics used in the EBMs were computed using age-adjusted values. All our controls were recruited from an ongoing longitudinal study of healthy aging. Amongst the 70 controls, 39 have confirmed A-beta negative PET scans and 8 were confirmed A-beta positive PET scans, and in the rest of the 23 we do not have any biomarker data available. However, in all the controls, we have conducted comprehensive neuropsychological assessment (see Appendix 1—table 1 in the revised supplementary file) and based on this data we can be quite confident about their lack of clinical deficits, and we have a very high degree of confidence that none of the controls have any neurodegeneration (AD-related or otherwise). Consistent with this assessment, in our EBM analyses, most of the control participants were indeed categorized to the preclinical stages.

      • Inferring the optimal z-scores is a strength, however as different sets of z-scores are allowed per biomarker, there is a concern that the changes reflected are mainly driven by the choice of z-score, rather than the markers themselves (e.g. if lower z-scores are selected for one marker than another, then changes in that marker will appear to be detected earlier, even if both markers change at the same time).

      Response: Indeed, the biomarker sequence depends on the choice of the z-scores per biomarker. However, please note that our choice of z-scores is based on maximizing the sequence likelihood. Therefore, other values of the z-scores will have by construction a smaller likelihood of sequence occurrence compared to the results shown.

      • In equation 2 it is unclear why the gaussian is measured based on a sum over I. The more obvious choice would be to use a multivariate gaussian with no covariance, which would mean taking the product rather than the sum over I.

      Response: We thank the reviewer for pointing this out and we now clarify this point. In this revision, we do not use the term ‘multivariate’. Indeed, the model likelihood assumes independence for each metric’s priors, and hence is the product of each metric’s univariate gaussian probability distribution. This can be seen in equations 1 and 2 of the revision manuscript (Section titled “Event-based sequencing modeling’). The assumption about independent priors is similar to the one used in the original event-based model (see equation (2) in A .L. Young et al., Nature Comm. 9.1 (2018): 4273).

      • In the original event-based model, k is a hidden variable. Presumably that is also the case here, however the notation k=stage(j) makes it seem like each subject is assigned a stage during the sequence optimisation.

      Response: We would like to clarify that the posterior probability of each stage for every subject is estimated during the sequence optimization. To clarify the notation, we have now deleted the term “stage” and use “tj” to denote stages for each subject j. The sequence optimization was performed with the assumption of a uniform prior distribution p(tj=k) = 1/(N+1) for each stage k. Then, the posterior probability p(tj=k|Zj,S), i.e., the probability that subject j belongs to stage k, given the metrics and the sequence, was computed during the sequence optimization procedure.

      • Typically for event-based modeling, positional variance diagrams are created from the markov chain monte carlo samples of the event sequence, enabling visualisation of the uncertainty in the sequence, but these are not included in the study.

      Response: In the revised supplementary file, we have now included positional uncertainty diagrams for the optimal set of z-score events that were created from 50,000 MCMC samples. Please see Appendix 1—figure 2 for the AC-EBM and Appendix 1—figure 9 for the SAC-EBMs.

      • Many of the figures in the manuscript (e.g. Figure 1E/G, Figure 2A/B, Figure 3A/B/E/F/I/J, Figure 4 A/B/E/F/I/J) are based on averages in both the x and the y axis. In the x dimension, individuals have a weighted contribution to the value on the y axis, depending on their stage probability. In the y dimension, the values are averages across those individuals, and the error bars represent the standard error rather than the standard deviation. Whilst the trajectories themselves are interesting, they may not be discriminative at the individual level and may be more heterogeneous than it appears.

      Response: In the current study, the predictions of trajectories are intended at the cohort level. Individual level investigations will be the topic of future investigations.

      • The bootstrapped statistical analyses comparing metrics between the stages do not consider the variability in the sequence.

      Response: Please see the response above. The positional uncertainty diagrams are included in the revised supplementary file.

      Reviewer #2 (Public Review):

      Summary:

      This work presented by Kudo and colleagues is of great importance to strengthen our understanding of electrophysiological changes in the course of AD. Although the main conclusions regarding functional connectivity and spectral power change through the course of the disease are not new and have been largely studied and theorised on, this article offers an innovative approach that certainly consolidates previous knowledge on the topic. Not only that, this article also broadens our knowledge presenting useful and important details on the specificity of frequency and cortical distribution of these early alterations. The main take-home message of this work is the early disruption of electrophysiological signatures that precedes detectable alterations in other more commonly used pathology markers (i.e. gray matter atrophy and cognitive impairment). More specifically, these signatures include long-range connectivity in the alpha and beta bands, and local synchrony (spectral power) in the same frequency bands.

      Response: Thank you for the positive comments and for providing a nice succinct summary.

      Strengths:

      The present work has some major strengths that make it paramount for the advance of our understanding of AD electrophysiology. It is a very well written manuscript that, despite the complexity of the analyses employed, runs the reader through the different steps of the analysis in a pedagogic and clever way, making the points raised by the results easy to grasp. The methodology itself is carefully chosen and appropriate to the nature of the question posed by the researchers, as event-based models are well-suited for cross-sectional data.

      The quality of the figures is outstanding; not only are they aesthetic but, more importantly, the figures convey information exceptionally well and facilitate comprehension of the main results.

      The conclusions of the paper are, in general, well described and discussed, and consider the state-of-the-art works of AD electrophysiology. Furthermore, even though the conclusions themselves are not groundbreaking at all (synaptic damage preceding structural and cognitive impairment is one of the epitomes of the pathological cascading model proposed by Jack in 2010), this article is innovative and groundbreaking in the way they address with clever analyses in a relatively large sample for neuroimaging standards.

      Response: Thank you for the positive comments of the strengths of the paper.

      Weaknesses:

      The main limitation of the work revolves around sample definition and inclusion criteria that are somewhat confusing obscuring some of the points of the analyses. Firstly it is not clear why the purely clinical approach is employed to diagnose the "probable Alzheimer´s Disease" for the 78 participants in the "AD group". In the same paragraph, it is stated that 67 out of the 78 participants show biomarker positivity, thus allowing a more biologically guided diagnosis that is preferred according to current NIA-AA criteria. This would avoid highly possible mixing of different subtypes of dementia etiologies. One might wonder, why would those 11 participants be included if we have strong indications that their symptoms are not due to AD? Furthermore, the real pathological status of the control group is somewhat questionable. The authors do not specify whether common AD biomarkers are available for this subgroup. In that case, it would have highly increased the clarity and interpretability of the results if this group was subdivided in a preclinical and completely healthy control group. This would be particularly interesting since a significant proportion of the control group is labeled as belonging to stages 2,3,4 (MCI) and even 5 (mild dementia). This raises the question of whether these participants are true healthy controls mislabeled by the EBM model, or actual cognitive controls with actual underlying AD pathology well identified by the model proposed.

      Response: Please see responses above to a similar comment from R1. To clarify, all our controls were recruited from an ongoing longitudinal study of healthy aging. Amongst the 70 controls, 39 have confirmed A-beta negative PET scans and 8 were confirmed A-beta positive PET scans, and in the rest of the 23 we do not have any biomarker data available. The biomarker positivity rates in our control cohort are completely consistent with the prevalence of A-beta positivity in cognitively healthy individuals and are within a normal biological continuum for amyloid beta (Jansen WJ et al. 2015). In all the controls, we have conducted comprehensive neuropsychological assessment (see Appendix 1—table 1 in the revised supplementary file) and based on this data we can be quite confident about their lack of clinical deficits, and we have a high degree of confidence that none of the controls have any neurodegeneration (AD-related or otherwise). We include these details in the revision (see the revised ‘Participants’ section in the Materials and methods.).

      Jansen WJ et al., 2015 JAMA; 667 313(19):1924-1938.

      On this note, Figure 2 (C and D) and Figure 3 (C, G and K) show a cortical surface depicting the mean difference of each stage vs the control group, which again, is formed by subjects that can be included (and in fact, are included) in all those stages, obscuring the meaning and interpretability of these cortical distributions.

      Response: We would like to clarify that these figures depict the regional maps of each metric for each stage of AD progression, not the contrast against a control group.

      Reviewer #1 (Recommendations For The Authors):

      • If possible, perform independent validation of the results.

      Response: This is something we indeed intend to examine in our future investigations.

      • Repeat the analysis in the subset of individuals that are amyloid positive.

      Response: Amongst the 78 AD patients, 20 had autopsy confirmed AD neuropathology, an additional 41 patients had molecular pathology identified by Abeta-PET, and another additional 9 had fluid biomarker (CSF) confirmation of amyloid and tau levels consistent with AD diagnosis. Eight remaining patients had a diagnosis of AD with high certainty, based on clinical presentation, neurological assessment, and cortical atrophy on MRI. Given that there are only eight patients who had clinical diagnosis of AD (with no biomarkers), and the comprehensive clinical characterization of all the AD patients in our cohort (Appendix 1—table 1), we do not believe that any subgroup analysis is warranted.

      • When inferring the optimal z-scores, select the same set of z-scores per biomarker, or include diagrams of stage vs z-score that include all of the markers so that it is easy to see how one marker changes relative to the others (overlay Figure 1G on Figure 2A and 2B).

      Response: How the neural synchrony metrics, PHG volume and MMSE scores change relative to each other is exactly what we show in Figures 3 B/F/J and 4 B/F/J. Since each EBM model optimizes the z-score thresholds, sequence likelihood and posterior probability of each stage for each subject, the EBM framework provides the most likely estimate for each metric at every stage. Therefore, the SAC-EBM model gives the most accurate description of the relative differences in these metrics over the AD progression stages. The reviewer’s suggestion to overlay Figure 1G (now figure 1F, based on optimized z-scores for PHG volume and MMSE scores) on Figures 2A and 2B will be inaccurate, as the neural synchrony measures plotted in figures 2A and 2B are not for optimized z-scores.

      • Change equation 2 to use a multivariate gaussian.

      Response: We now clarify that we use a factorized multivariate form that reflects independent priors for each metric which are Gaussian.

      • Clarify whether k is a hidden variable and possibly change the notation.

      Response: We now clarify that in our notation, k is a label for the stage [k=1,..,7 (when I=2) or k=1,...,10 (when I =3)] and is indeed a hidden variable and not observed (but inferred from the EBM). Specifically, the posterior probability for each subject j belonging to stage k was estimated as part of the sequence optimization procedure.

      • Generate positional variance diagrams of the MCMC samples.

      Response: We are doing the MCMC to obtain the most likely sequence. We have now included positional variance diagrams of the optimal set of z-score events in Appendix 1—figure 2 and Appendix 1—figure 9 in the revised supplementary file.

      • It would be interesting to study whether the stages are predictive of conversion or look at longitudinal data, if available.

      Response: This is something we indeed intend to examine in our future investigations.

      • Also look at statistics across MCMC samples of the sequence.

      Response: Thank you for this suggestion. In the Appendix 1—figure 10, we now include an example of the MCMC samples for an SAC-EBM including the alpha-band AEC. We then derived the positional variances for each metric that are now shown in Appendix 1—figure 2 and Appendix 1—figure 9.

      Reviewer #2 (Recommendations For The Authors):

      Some really minor changes are suggested on two specific points that somewhat confused me as a reader and got me stuck in the reading process to try to get the meaning of what I was seeing/reading:

      1. It is not specified (or at least I was unable to find it) what are you comparing exactly for the group comparison in the long-range synchrony metric (AEC) before creating your scalar metric. Are you comparing individual links (in which case you would have 93 link values for each ROI to compare)? Or are you comparing the strength for each ROI (thus, one value -the individual links sum- for each ROI)? I guess it should be the latter for what I see in the figures but it could be useful to specify it.

      Response: The reviewer is correct. We compare the strength of each ROI, i.e., averaging over edges of the symmetric AEC matrix of functional connectivity. We now clarify this in the Amplitude-envelope correlation section and the caption of the revised Appendix 1—figure 6.

      1. In Figure 1 (which, by the way, is exceptionally aesthetic, congratulations for that!) I got stuck for a relatively long time in a really small detail and I am not completely sure if I came to the right conclusion. It is regarding the X axis of the histograms in panels B and D. They are expressed as "PHG volume loss" and "MMSE decline". So I supposed those histograms were showing some kind of subtraction, (maybe from stage X to stage Y, or from group X to group Y). I was trying to understand the histogram and rereading methods to see if I overlooked any description of that graphic and then just realized they might be just the Z-score itself for each group (control and AD) with respect to the whole population. If that is the case I would suggest changing the X-label to "PHG z-score" and "MMSE z-score" avoiding the reference to "loss and "decline" as they are just reflecting the direct transformation to z-score.

      Response: Thank you. We would like to clarify that the z-score for PHG volume and MMSE scores were sign-inverted so that higher values denote “PHG Volume loss” and “MMSE decline”, respectively. We now clarify this point in the revised text and legend for the revised figure 1.

      Lastly, regarding the point I raised in the limitations section of the public review, I understand it might fall out of the scope of eLife reviewing process as it would require a more extensive change of the current manuscript, which is great as it is. But as a reader and researcher in the field, I would have recommended using biomarkers to divide the control group (if available) thus including in the models only those belonging to the AD continuum according to their biomarker status, and leaving those control without any biomarker positivity as the reference group for the figures I mention in that section (those showing differences for each stage in the cortical surface with respect to the control group).

      Response: Please see a similar comment from R1. Amongst the 70 controls, 39 have confirmed A-beta negative PET scans and only 8 were confirmed A-beta positive PET scans, and in the rest of the 23 we do not have any biomarker data available. In all the controls, we have conducted comprehensive neuropsychological assessment (see Appendix 1—table 1 in the revised supplementary file) and based on this data we can be quite confident about their lack of clinical deficits, and we have a high degree of confidence that none of the controls have any neurodegeneration (AD-related or otherwise). Since only 8 participants were confirmed as amyloid positive in the control group and this sample size is small, we do not conduct this recommended re-analysis in this manuscript.

    1. Author Response

      We appreciate your comments and also thanks to the reviewers for providing valuable feedback and recommendations. For most of the recommendations, we will respond in the revised version, which will provide more information for readers to understand and apply the study. For some of the recommendations, we can give quick responses as follows:

      Reviewer #2 (Public Review):

      The differences between passive and active immunolabeling, as well as photobleaching data, should be addressed for a comprehensive understanding.

      In passive immunolabeling, antibodies penetrate and achieve their targets merely via diffusion, without any additional force. In contrast, active immunolabeling utilizes an external force, such as pressure, electrophoresis, etc., to facilitate antibody penetration and therefore significantly speed up the staining process (i.e., one day vs. 2 months for a whole mouse brain). In our study, the samples we were dealing with were centimeter-sized; therefore, we employed only active electrophoretic immunolabeling (details provided in Materials and Methods). However, for laboratories that do not possess adequate devices or handle small specimens, they can employ passive immunolabeling instead. As for the photobleaching data, we will provide it in the revised version.

      The compatibility of MOCAT with genetically encoded fluorescent proteins remains unclear and warrants further investigation.

      We agree with the possibility that the encoded fluorescent proteins will be affected. Since there is evidence that fluorescence can be quenched by xylene and alcohol, which are two organic solvents used in paraffin processing, we think boost immunolabeling is necessary for observing genetically encoded fluorescent proteins. We also pointed out this limitation in the Discussion:

      “Fourth, endogenous fluorescence—such as GFP, YFP, and tdTomato—may be quenched during paraffin processing and thus need to be visualized by means of additional immunolabeling.”

      However, the extent to which endogenous fluorescence will be quenched during the paraffin processing and MOCAT procedure, and how much boost labeling can rescue, is worth investigating for broadening the application of MOCAT. We will provide it in the revised version.

      The composition of NFC1 and NFC2 solutions for refractive index matching should be provided.

      Since NFC1 and NFC2 are commercial products from Nebulem (Taiwan), the composition is non-disclosable. However, the refractive index of NFC1 and NFC2 is 1.47 and 1.52, respectively.

    1. Author Response:

      Update, January 11, 2024:

      During the course of our careful revising of the paper, we discovered an inconsistency in the way we presented data for figures 5 and 6. Specifically, we used optogenetics to induce ataxia in mice. However, "ataxia", as a phenotype, can be initiated by a spectrum of cell dysfunctions as revealed by previous studies. We systematically explored this with optogenetics in this current work. Our error is that we presented one stimulation paradigm to show ataxic cell firing (2 ms on / 11 ms off square wave) and then presented a slightly different paradigm to show ataxic animal behavior (10 ms on / 10 ms off square wave). We note that our ataxia paradigms do not affect the outcomes of the dystonia and tremor stimulations. Importantly, the choice of ataxia paradigm does not change the conclusions of the paper. Regardless, for clarity we are actively working to make the stimulation parameters that we present consistent between figures 5 and 6.

      October 10, 2023:

      We would like to thank all three reviewers for providing excellent suggestions that will enable us to strengthen our manuscript and enhance the impact of our findings. We plan on addressing the comments by altering the text, providing additional data, revising the figures as requested, and most importantly by providing an improved classifier model. Where relevant, we will also provide the reviewers with a response to specific questions that they raised. We will respond to the reviewer’s comments in a point-by-point manner when we submit a revised manuscript. Below, we include an outline of the main points that we intend to address.

      Although we will respond in full to all comments and suggestions in the revised documents, here we outline only the major areas in order provide context for our revisions. 1) The major point of concern raised by the reviewers is the strength of the classifier model. We agree with the reviewers that we should put forward the strongest model possible as this forms a core component of our paper. We are planning on retraining our model using the suggestions put forward by the reviewers in the public and author-directed comments. Importantly, given the healthy discussion about our model, our revised manuscript will now also include additional clarification about the choice of the model architecture and limitations of our data structure. Based on the reviewers’ comments, we will include a brief discussion about possible future ways of improving the model. 2) We will provide additional figures and updated figure panels to reflect the new data analyses. Ultimately, we agree that the major strength of our manuscript lies within the many mouse models tested and validation of the classification in different genetic, pharmacological, and optogenetic mouse models, a point raised by all three reviewers. We are confident that the revised images will reflect these strengths. 3) In addition to improving our classifier model, we are planning on making textual changes to clarify several parts of the text and propose a new title that better reflects the data put forth in our manuscript. 4) There are several minor but important comments that were raised by all three reviewers. We will also incorporate these changes as suggested.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Recommendations For The Authors):

      1. Fig. 2B: In their previous comment #6, I assume that Reviewer #2 was asking about peaks that were called as statistically significant above background, not just "higher" as assessed by eye. The authors have now marked peaks that are "higher" but still do not indicate that they were called as statistically significant by any software. I agree that they need to indicate in the figure which peaks were discovered by formal analysis.

      Response: Thank you for the professional suggestions. We used the Piranha (version 1.2.1) software to call peaks from CLIP-seq data, in which the P-value threshold for peaks (i.e., the -p parameter) was set as 0.05. And then any region above the IgG peak could be a binding region, and of course, the higher the peak, the more pre-mRNA SRSF1 binds in that region.

      1. Similar to the above comment, in Fig. 7G "visual analysis" of IGV tracks is not an assay. It is fine to show the tracks as an example of the differential expression called using DESeq2, but this should be described for what it is.

      Response: We thank the reviewer for the professional comments. Following this advice, we have corrected the text in this revised version (Page 11, Line 233).

      1. Fig 5C: TUNEL results are supported by a single image of only a few cells. It is important to include quantitation as has been done for other microscopy data.

      Response: Thank you for the professional suggestions. Following this advice, we have added the quantitative data in Figure 5C. Also, we have added specific quantification methods to the text (Page 23, Line 484-485).

      1. Legend to Fig 6C-E: I assume n=4 refers to the number of animals. It would be best to also know many cells/tubules were counted for each animal.

      Response: Thank you for the helpful comments. Following this advice, we have revised the legend for Figure 6D, E (Page 12, Line 246-249).

      1. There appears to be a mistake in line 285-287, which reads: "the overall analysis of aberrant AS events showed that SRSF1 effectively promotes the occurrence of SE and MXE events and inhibits the occurrence of RI events." The data in Fig 8C appears to show the opposite, with more SE and MXE, and fewer RI events, in the SRSF1 KO. This would imply that SRSF1 normally inhibits SE/MXE and promotes RI.

      Response: Thank you very much for the professional comments. Following this advice, we have corrected the text in this revised version (Page 14, Line 286-288).

      1. In Fig. 8E, an upper band is depleted in SRSF1 KO, but in Figure 8J, a much lower band is depleted. How is this explained?

      Response: Thank you for the professional suggestions. Since exon 7 of Tial1 is in the non-coding region, the lower band in Figure 8E does not correspond to the lower band in Figure 8J. For better understanding, we show the detailed information of Tial1 in the attached Figure S3.

      1. Line 81: As a very minor point, "AS" is defined as alternative splicing in the abstract, but should be re-defined again in the main text when first mentioned.

      Response: Thank you for the helpful comments. Following this advice, we have corrected the text in this revised version (Page 3, Line 81).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor and the reviewers for their valuable and constructive feedback. In the revised manuscript, we have incorporated and addressed the suggestions provided by the reviewers.

      Reviewer #1 (Recommendations For The Authors):

      The primary recommendation is to provide additional language explaining how KinCytE will be updated.

      Response: We appreciate the reviewer’s insightful feedback regarding the KinCytE update. In response, we have included additional details in the “Development and use of KinCyte’ section as follows: “We welcome researchers to actively participate in advancing the development of KinCytE by sharing external screening data, especially data on new secreted factors and cell types that extend beyond macrophages. This collaborative effort promises to enhance our understanding of kinase-focused networks, opening new avenues for cutting-edge therapeutic approaches”. In addition, we explicitly state in the "Data, Software, and Availability" section, "To contribute data, kindly email the corresponding author and refer to Table S2 for guidance on the preferred file format."

      Reviewer #2 (Recommendations For The Authors):

      Would have been nice to see a validation of the regression models from outside of the training data. I would also consider removing statements like "We anticipate that KinCytE will be highly sought after by biologists... " , it reads like a grant application (and this is not)! Could tone the language down a bit. In the future, you might consider displaying your graphs as "biofabrics", they're much cleaner than "hairballs" (PMID: 23102059). Or potentially, show a hierarchical view where the selected cytokine (or other) is at the root, and you can immediately see what's connected. Anyway, the network display can be expanded. Consider maybe adding the nearest neighbors to the table on the right after selecting the node. Generally, though, I like how it works.

      There needs to be a button to download the graph as a .csv file. Maybe the subgraph after selecting a node (or set of nodes). Also, once you're at a graph view, it's hard to guess how to get back to the starting page. Maybe just one button with a "home" on it would fix that. On the Kinases Discovery, why are the gene symbols all lower case? Very cool!

      Response:: We greatly value the reviewer's constructive suggestions. To incorporate these, we have made the following changes:

      (1) "We anticipate that KinCytE will be highly sought after by biologists... " This sentence is removed.

      (2) A ‘SAVE CSV’ button is added to the bottom right of the Cytokine Explorer page, which allows the users to download the graph as a csv file.

      (3) A redesigned KinCyte logo now functions as the 'HOME' button, located at the top left of the webpage, ensuring that users can easily return to the homepage at any time.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript describes the synergy among PI3Kbeta activators, providing compelling results concerning the mechanism of their activation. The particular strengths of the work arise to a great extent from the reconstitution system better mimicking the natural environment of the plasma membrane than previous setups have. The study will be a landmark contribution to the signaling field.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript aims to provide mechanistic insight into the activation of PI3Kbeta by its known regulators tyrosine phosphorylated peptides, GTP-loaded Rac1 and G-protein beta-gamma subunits. To achieve this the authors have used supported lipid bilayers, engineered recombinant peptides and proteins (often tagged with fluorophores) and TIRF microscopy to enable bulk (averages of many molecules) and single molecule quantitation. The great strength of this approach is the precision and clarity of mechanistic insight. Although the study does not use "in transfecto" or in vivo models the experiments are performed using "physiologically-based" conditions and provide a powerful insight into core regulatory principles that will be relevant in vivo.

      The results are beautiful, high quality, well controlled and internally consistent (and with other published work that overlaps on some points) and as a result are compelling. The primary conclusion is that the primary regulator of PI3Kbeta are tyrosine phosphorylated peptides (and by inference tyrosine phosphorylated receptors/adaptors) and that the other activators can synergise with that input but have relatively weak impacts on their own.

      Although the methodology is not easily imported, for reasons of both cost and the experience needed to execute them well, the results have broad importance for the field and reverse an impression that had built in large parts of the broader signalling and PI3K communities that all of the inputs to PI3Kbeta were relatively equivalent, however, these conclusions were based on "in cell" or in vivo studies that were very difficult to interpret clearly.

      Reviewer #2 (Public Review):

      The manuscript of Duewell et al has made critical observations that help to understand the mechanisms of activation of the class IA PI3Ks. By using single-molecule kinetic measurements, the authors have made outstanding progress toward understanding how PI3Kbeta is uniquely activated by phosphorylated tyrosine kinase receptors, Gbeta/gamma heterodimers and the small G protein Rac1. While previous studies have defined these as activators of PI3Kbeta, the current manuscript makes clear the quantitative limitations of these previous observations. Most previous quantitative in vitro studies of PI3Kbeta activation have used soluble peptides derived from bis-phosphorylated receptors to stimulate the enzyme. These soluble peptides stimulate the enzyme, and even stimulate membrane interaction. Although these previous studies showed that the release of p85-mediated autoinhibition unmasks an intrinsic affinity of the enzyme for lipid membranes, they ignored what would be the consequence of these peptide sequences being present in the context of intrinsic membrane proteins. The current manuscript shows that the effect of membrane-conjugated peptides on the enzyme activity is profound, in terms of recruiting the enzyme to membranes. In this context, the authors show that G proteins associated with the membranes have an important contribution to membrane recruitment, but they also have a profound allosteric effect on the activity on the membrane, These are observations that would not have been possible with bulk measurements, and they do not simply recapitulate observations that were made for other class IA PI3Ks.

      An important observation that the authors have made is that Gbeta/gamma heterodimers and RAc1 alone have almost no ability to recruit PI3Kbeta to the membranes that they are using, and this is central to one of the most profoundly novel activation mechanisms offered by the manuscript. The authors propose that the nSH2- and Gbeta/gamma binding sites partially overlap, so that Gbeta/gamma can only bind once the nSH2 domain releases the p110beta subunit. This mechanism would mean that once the nSH2 is engaged by membrane-conjugated pY, the Gbg heterodimer can bind and increase the association of the enzyme with membranes. Indeed, this increased membrane association is observed by the authors. However, the authors also show that this increased recruitment to membranes accounts for relatively little increase in activity, and that the far greater component of activation is due to an allosteric effect of the membrane association on the activity of the enzyme. The proposal for competition between Gbg binding and the nSH2 is consistent with the behavior of an nSH2 mutant that cannot bind to pY and which, consequently, does not vacate the Gbg-binding site. In addition to the outstanding contribution to understanding the kinetics of activation of PI3Kbeta, the authors have offered the first structural interpretation for the kinetics of Gbg activation in synergy with pY activation. The proposal for an overlapping nSH2/Gbg binding site is supported by predictions made by John Burke, using alphafold multimer. Although there is no experimental structure to support this structural model, it is consistent with HDX-MS analyses that were published previously.

      Reviewer #1 (Recommendations For The Authors):

      1. The approx relative concentrations (surface densities ) of Rac1-GTP, GBetagammas and PY-peptides used in experiments in Fig 1 are not easy to understand and useful to give an intuitive feel for the relative sensitivity of the PI3Kbeta reporter to those inputs.

      In our revised manuscript, we provide densities of the individual signaling inputs used to reconstitute Dy647-PI3Kβ membrane recruitment (see Figure legend 1). We provide a more detailed explanation about our quantification method in subsequent figures where the membrane surface density of signaling inputs is varied to modulate the strength of PI3Kβ membrane localization and activity.

      Building off the quantification of Rac1-GTP and pY membrane density measurements presented in our initial manuscript submission, we now include an estimate of the GβGγ membrane density. For these new measurements, we recombinantly expressed and purified additional SNAP-GβGγ protein, which we fluorescently labeled with AlexaFluor 555. The membrane surface density of GβGγ was quantified at equilibrium using a combination of AF488-SNAP-GβGγ (bulk signal) and dilute AF555-SNAP-GβGγ (0.0025%), which allowed us to resolve and count the single molecule density (Figure 3A). We calculate the total surface density of GβGγ based on the AF555-SNAP-GβGγ dilution factor. In the methods section titled, “surface density calibration,” we describe our protocol.

      1. The estimates of the PIP3 concentrations/densities measured using the BTK reporter seem good but its unclear (to me) how they were derived.

      The density of PI(3,4,5)P3 lipids in our supported lipid bilayers was calculated based on the incorporation of a define molar ratio of PI(3,4,5)P3 in our small unilamellar vesicles. Based on the average footprint of 0.72 nm2 for a single lipid, we calculated the density of lipids per µm2. In the methods section titled, “kinetic measurements of PI(3,4,5)P3 lipid production,” we include the following description:

      “Assuming an average footprint of 0.72 nm2 for phosphatidylcholine (Carnie et al., 1979; Hansen et al., 2019), we calculated a density of 2.8 × 104 PI(3,4,5)P3 lipids/μm2 for supported membranes that contain an initial concentrations of 2% PI(4,5)P2. We assume that the plateau fluorescence intensity of the AF488-SNAP-Btk sensor following reaction completion in the presence of PI3Kβ represents the production of 2% PI(3,4,5)P3. The bulk membrane intensity of AF488-SNAP-Btk was normalized from 0 to 1, and then multiplied times the total density of PI(3,4,5)P3 lipids to generate kinetic traces that report the kinetics of PI(3,4,5)P3 production.”

      Minor points

      l164; Rac1(GTP) AND GBeta gammas. In this context it should be OR. Or have I misunderstood?

      l1093; kineticS measurementS.

      Thank you for pointing out these typos. We made the appropriate edits.

      The paper of Suire etal (Suire, S., Lécureuil, C., Anderson, K. E., Damoulakis, G., Niewczas, I., Davidson, K., Guillou, H., Pan, D., Jonathan Clark, Phillip T Hawkins, & Stephens, L. (2012). GPCR activation of Ras and PI3Kc in neutrophils depends on PLCb2/b3 and the RasGEF RasGRP4. The EMBO journal, 31(14), 3118-3129. https://doi.org/10.1038/emboj.2012.167) make the point that in vivo it appears that although Ras-activation is required for full activation of PI3Kgamma (and can activate PI3Kgamma in vitro directly) if you use tools to activate Ras in the absence of receptor and Gbetagamma signalling, it has no affect on PIP3 . This directly supports the authors conclusions.

      Thank you for sharing this citation. We incorporated the reviewer’s insight into our discussion section to broaden the significance of our work.

      Reviewer #2 (Recommendations For The Authors):

      There are only a few relatively minor points that could be addressed to improve the paper:

      1. Why is the density still going up after 10 minutes in Figure 1 Figure supplement 2? Doesn't this seem like a very long time? Are we seeing fast on/off combined with fast on/slow off? Are the particles eventually becoming stuck in odd places or are they slowly denaturing?

      Our movies do not indicate a slow accumulation of immobilized or stuck Dy647-PI3Kβ particles on the membrane surface. On the long timescale, we believe that a small fraction of Dy647-PI3Kβ molecular do exhibit longer dwell times on membranes containing a high density of pY (>6,000 molecules/µm2). This is likely due to membrane hopping of Dy647-PI3Kβ. In other words, rather than Dy647-PI3Kβ dissociating from the membrane surface directly into the solution, the Dy647-PI3Kβ molecule immediately rebinds to another membrane conjugated pY peptide. This type of behavior of a peripheral membrane binding protein is generally correlated with there being a higher surface density of the binding partner (Yasui et al., 2014). Characterization of potential Dy647-PI3Kβ membrane hopping will require additional experimentation (e.g. PI3Kβ mutants) and quantitative analysis that goes beyond the scope of this study.

      1. Lines 188-189. "By quantifying the average number of Alexa488-pY particles per unit area of supported membrane we calculated the absolute density of pY per μm2 (Figure 2D). I think this should be Figure 2C, right hand y-axis.

      Thank you for identifying our typo. We’ve corrected the text for clarity.

      1. Lines 102-193. "When Dy647-PI3Kβ was flowed over a membrane containing a low density of {less than or equal to} 500 pY/μm2, we observed rapid equilibration kinetics consistent with a 1:1 binding stoichiometry (Figure 2E).” There is no density shown in Fig. 2E. There is only "membrane intensity." Perhaps it was their intent to include a right-hand axis with density (number of particles/area), as they did in Figure 2C. However, they did not, so Figure 2E does not support the text. The value of Intensity/#py/um**2 does not appear to be the same for Figure 2C as for Figure 2E, assuming that the statement in the text is correct. The authors should include the density as a right-hand axis in 2E.

      We have reworded this portion of the results section for clarity. In reading the reviewers comment, we recognize that a more convincing way to support our claim of a 1:1 binding stoichiometry would be to show that there are ~500 Dy647-PI3Kβ/μm2 membrane bound complexes when the pY surface density equals ~500 pY/μm2. For us to make this connection, we would need to perform experiments using a Dy647-PI3Kβ concentration that fully saturates all the binding pY binding sites. However, at this elevated Dy647-PI3Kβ solution concentration, individual Dy647-PI3Kβ complexes can start to bind to a single phosphotyrosine of the dually phosphorylated peptide due to competition for pY binding sites. As an alternative to performing the experiment described above, we can infer binding stoichiometry from the shape of the membrane absorption kinetic traces. For example, a simple bimolecular interaction exhibits rapid equilibration kinetics with a hyperbolic shaped kinetic trace. Systems that have more complex binding equilibria, however, generally take longer to equilibrate (due to the change in KOFF) and can often be broken down into 2 or 3 distinct dissociation constants (KD). This type of kinetic analysis has previously been used to describe multivalent membrane binding interactions for the Btk-PI(3,4,5)P3 (Chung et al., 2019) and PI3Kγ-GβGγ (Rathinaswamy et al., 2021) complexes. Considering that there are multiple interpretations of the Dy647-PI3Kβ membrane absorption traces show in Figure 2E, we refrain from saying that our results explicitly reveal a 1:1 binding stoichiometry. Instead, we provide several possible explanations for the results. Ultimately, additional experiments and kinetic modeling of wild type and mutant PI3Kβ is necessary to define the binding stoichiometry under different conditions.

      1. Table 1. The authors have analysed the data to extract two dwell times and two diffusion coefficients. The legend should make this clear, referring to D1 as the slow diffusion component and D2 as fast diffusion, similarly, there are short and long dell times. This should be stated in the legend. There are two columns labelled "alpha". This presumably should be alpha1 and alpha2, the fractions of particles with short and long dwell times. The table legend should clarify this.

      In our revision, additional text has been added to the figure legends and Table 1.

      Text from Table 1: “Alpha (α) equals the fraction of molecules with the characteristic dwell time, τ1 (DT = dwell time). The fraction of molecules with the characteristic dwell time, τ2, equals 1-α. Alpha (αD) equals the fraction of molecules with the characteristic diffusion coefficient, D1. The fraction of molecules with diffusion coefficient, D2, equals 1-αD.”

      1. In the legend for Figure 5 figure supplement 1, for part D, the "Cumulative membrane of binding events..." The "of" should be deleted.

      Thank you for identifying this typo.

      1. Lines 423-426: "We found that PI3Kβ kinase activity is also relatively insensitive to either Rac1(GTP) or GβGγ alone. This is in contrast to previous reports that showed Rho-GTPases (Fritsch et al. 2013) and GβGγ (Katada et al. 1999; Hashem A. Dbouk et al. 2012; Maier, Babich, and Nürnberg 1999) can activate PI3Kβ, albeit modest, compared to synergistic activation with pY peptides plus Rac1(GTP) or GβGγ." It is not clear what this statement means. On the surface, it might be interpreted as saying that these previous studies had some flaw that led the authors to conclude that there is some activation caused by Rac1 or Gbeta/gamma on their own. The current manuscript is an important contribution to understanding the mechanism of synergistic activation, but it is also true that the Hansen and his colleagues have not used the same membranes as were used previously. The authors state that they have used a wide range of membrane compositions, but the only ones that have appeared in the manuscript are nearly pure PC (with 2% PIP2) or PC with 20% PS. Extensive studies with varying membrane compositions are beyond the scope of the current study, since the current manuscript concisely makes important observations regarding mechanism. However, it would be helpful for readers if the authors at least mention the differences in membrane compositions among the studies.

      The reviewer raises an important point concerning our interpretation of PI3Kβ activation data in relationship to existing literature. In our original submission, we made conclusions concerning how individual signaling inputs modulate PI3Kβ activity, without showing all our data or providing sufficient explanation. In our revised manuscript, we include PI3Kβ kinase activity measurements performed in the presence of either pY, Rac1(GTP), or GβGγ alone (Figure 5B-5C). These experiments were reconstituted on supported membranes in the absence or presence of 20% PS lipids. We found that increasing the density of anionic lipids increased the overall activity of PI3Kβ in the presence of pY or GβGγ alone. This is consistent with a subtle increase in PI3Kβ membrane affinity due to the negatively charged PS lipids. Mutations that disrupt the direct interaction between PI3Kβ and GβGγ eliminated the observed lipid kinase activity. We were unable to detect PI3Kβ activity in the presence of Rac1(GTP) alone. In conclusion, we’re able to detect some PI3Kβ activity in the presence of GβGγ alone, which is consistent with previous reports (Dbouk et al., 2010; Katada et al., 1999; Maier et al., 2000). In the future, a more comprehensive analysis will be required to map the relationship between PI3Kβ activity, membrane localization, and lipid composition. For example, previous reconstitutions have revealed differential activation of PI3Kα that depends on the most abundant lipid being phosphatidylethanolamine (PE) rather than phosphatidylcholine (PC) (Hon et al., 2012; Ziemba et al., 2016). PE lipids comprise 25-30% of the cellular plasma membrane (Yang et al., 2018) and have been used in previous studies to measure PI3K lipid kinase activity on small unilamellar vesicles (Dbouk et al., 2010; Hon et al., 2012).

      In this study, we elected to use a simplified membrane composition that minimized non-specific membrane localization of fluorescently labeled PI3Kβ. This allowed us to more clearly define the strength of individual and combinations of protein-protein interactions that regulate PI3Kβ localization and kinase activity. When reconstituting amphiphilic molecules (i.e. lipids) in aqueous solution a variety of structures, including micelles, inverted micelles, and planar bilayers can form based on the lipid composition (Kulkarni, 2019). The organization of these membrane structures is related to the molecular packing parameter of the individual phospholipids (Israelachvili et al., 1976). The packing parameter (P=v⁄((a•l_c))) depends on the volume of the hydrocarbon (v), area of the lipid head group (a), and the lipid tail length (l_c). When generating supported lipid bilayers on a flat two-dimensional glass surface, we aim to create a fluid lamellar membrane. We find that phosphatidylcholine (PC) lipids are ideal for making supported lipid bilayers because they have a packing parameter of ~1 (Costigan et al., 2000). In other words, PC lipids are cylindrical like a paper towel roll. In contrast, cholesterol and phosphatidylethanolamine (PE) lipids have packing parameters of 1.22 and 1.11, respectively (Angelov et al., 1999; Carnie et al., 1979). This gives cholesterol and PE lipids an inverted truncated cone shape, which prefers to adopt a non-lamellar phase structure. Due to the intrinsic negative curvature of PE lipids, they can spontaneously form inverted micelles (i.e. hexagonal II phase) in aqueous solution when they are the predominant lipid species (Israelachvili et al., 1980; Kobierski et al., 2022; Wnętrzak et al., 2013). In the methods section of our manuscript, we note that from our experience incorporation of PE lipids dramatically reduces the protein-maleimide coupling efficiency, displayed more membrane defects, and resulted in a larger fraction of surface immobilized Dy647-PI3Kβ. This could be related to the intrinsic negative curvature of PE membranes. However, further investigation is needed to decipher these issues.

      Angelov B, Ollivon M, Angelova A. 1999. X-ray Diffraction Study of the Effect of the Detergent Octyl Glucoside on the Structure of Lamellar and Nonlamellar Lipid/Water Phases of Use for Membrane Protein Reconstitution. Langmuir 15:8225–8234. doi:10.1021/la9902338

      Carnie S, Israelachvili JN, Pailthorpe BA. 1979. Lipid packing and transbilayer asymmetries of mixed lipid vesicles. Biochim Biophys Acta 554:340–357. doi:10.1016/0005-2736(79)90375-4

      Chung JK, Nocka LM, Decker A, Wang Q, Kadlecek TA, Weiss A, Kuriyan J, Groves JT. 2019. Switch-like activation of Bruton’s tyrosine kinase by membrane-mediated dimerization. Proc Natl Acad Sci 116:10798–10803. doi:10.1073/pnas.1819309116

      Costigan SC, Booth PJ, Templer RH. 2000. Estimations of lipid bilayer geometry in fluid lamellar phases. Biochim Biophys Acta 1468:41–54. doi:10.1016/s0005-2736(00)00220-0

      Dbouk HA, Pang H, Fiser A, Backer JM. 2010. A biochemical mechanism for the oncogenic potential of the p110 catalytic subunit of phosphoinositide 3-kinase. Proc Natl Acad Sci 107:19897–19902. doi:10.1073/pnas.1008739107

      Hansen SD, Huang WYC, Lee YK, Bieling P, Christensen SM, Groves JT. 2019. Stochastic geometry sensing and polarization in a lipid kinase–phosphatase competitive reaction. Proc Natl Acad Sci 116:15013–15022. doi:10.1073/pnas.1901744116

      Hon W-C, Berndt A, Williams RL. 2012. Regulation of lipid binding underlies the activation mechanism of class IA PI3-kinases. Oncogene 31:3655–3666. doi:10.1038/onc.2011.532

      Israelachvili JN, Marcelja S, Horn RG. 1980. Physical principles of membrane organization. Q Rev Biophys 13:121–200. doi:10.1017/s0033583500001645

      Israelachvili JN, Mitchell DJ, Ninham BW. 1976. Theory of self-assembly of hydrocarbon amphiphiles into micelles and bilayers. J Chem Soc Faraday Trans 2 Mol Chem Phys 72:1525–1568. doi:10.1039/F29767201525

      Katada T, Kurosu H, Okada T, Suzuki T, Tsujimoto N, Takasuga S, Kontani K, Hazeki O, Ui M. 1999. Synergistic activation of a family of phosphoinositide 3-kinase via G-protein coupled and tyrosine kinase-related receptors. Chem Phys Lipids 98:79–86. doi:10.1016/S0009-3084(99)00020-1

      Kobierski J, Wnętrzak A, Chachaj-Brekiesz A, Dynarowicz-Latka P. 2022. Predicting the packing parameter for lipids in monolayers with the use of molecular dynamics. Colloids Surf B Biointerfaces 211:112298. doi:10.1016/j.colsurfb.2021.112298

      Kulkarni CV. 2019. Calculating the “chain splay” of amphiphilic molecules: Towards quantifying the molecular shapes. Chem Phys Lipids 218:16–21. doi:10.1016/j.chemphyslip.2018.11.004

      Maier U, Babich A, Macrez N, Leopoldt D, Gierschik P, Illenberger D, Nürnberg B. 2000. Gβ 5 γ 2 Is a Highly Selective Activator of Phospholipid-dependent Enzymes. J Biol Chem 275:13746–13754. doi:10.1074/jbc.275.18.13746

      Rathinaswamy MK, Dalwadi U, Fleming KD, Adams C, Stariha JTB, Pardon E, Baek M, Vadas O, DiMaio F, Steyaert J, Hansen SD, Yip CK, Burke JE. 2021. Structure of the phosphoinositide 3-kinase (PI3K) p110γ-p101 complex reveals molecular mechanism of GPCR activation. Sci Adv 7:eabj4282. doi:10.1126/sciadv.abj4282

      Wnętrzak A, Lątka K, Dynarowicz-Łątka P. 2013. Interactions of alkylphosphocholines with model membranes-the Langmuir monolayer study. J Membr Biol 246:453–466. doi:10.1007/s00232-013-9557-4

      Yang Y, Lee M, Fairn GD. 2018. Phospholipid subcellular localization and dynamics. J Biol Chem 293:6230–6240. doi:10.1074/jbc.R117.000582

      Yasui M, Matsuoka S, Ueda M. 2014. PTEN Hopping on the Cell Membrane Is Regulated via a Positively-Charged C2 Domain. PLoS Comput Biol 10:e1003817. doi:10.1371/journal.pcbi.1003817

      Ziemba BP, Burke JE, Masson G, Williams RL, Falke JJ. 2016. Regulation of PI3K by PKC and MARCKS: Single-Molecule Analysis of a Reconstituted Signaling Pathway. Biophys J 110:1811–1825. doi:10.1016/j.bpj.2016.03.001

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the referee for the positive review.

      Reviewer #2 (Public review):

      We thank the referee for his/her constructive comments

      1. The weakness of this work is the lack of clarification on the function of eIF2A in general. The novelty of this study was limited.

      We believe our study is valuable in providing strong evidence that eIF2A does not functionally substitute for eIF2 in tRNAi recruitment even when eIF2 function is impaired, and in showing that it does not contribute to translational control by uORFs or IRESs, thus ruling out the most likely possibilities for its function in yeast based on studies of the mammalian factor. We agree that the function of yeast eIF2A remains to be identified; however, we think this should be regarded as a limitation rather than a weakness in experimental design or data obtained in the current study.

      1. Related to this, it would be worth investigating common features in mRNAs selectively regulated (surveyed in Figure 3A).

      We did not embark on this because only 17 of the 32 transcripts showing TE reductions in Fig. 3A showed a pattern of TE changes consistent with a conditional requirement for eIF2A under conditions of reduced eIF2 function, exhibiting greater TE decreases when both eIF2 function was impaired by phosphorylation and eIF2A was eliminated from cells. Moreover, we could validate this conditional eIF2A dependence by LUC reporter for only a single mRNA, HKR1.

      Also, it would be worth analyzing the effect of eIF2A deletion on elongation (ribosome occupancy on each codon and/or global ribosome footprint distribution along CDS) and termination/recycling (footprint reads on stop codon and on 3′ UTR).

      We have analyzed the effects of deleting eIF2A on ribosome pausing at individual codons by calculating tri-peptide pause scores from our ribosome profiling data. The results shown in new Fig. 7 reveal that eIF2A plays no discernible role in stimulating the rate of decoding of any three-codon combinations.

      1. Regarding Figure 3D, the reporters were designed to include promoter and 5′ UTR of the target genes. Thus, it should be worth noting that reporter design was based on the assumption that eIF2A-dependency in translation regulation was not dependent on 3′ UTR or CDS region. The reason why the effects on ribosome profiling-supported mRNAs could not be recapitulated in reporter assay may originate from this design. This should be also discussed.

      We agree and included this stipulation in the DISCUSSION, while at the same time noting that the native mRNAs were examined in the orthogonal assay of polysome distributions.

      1. Related to the point above, the authors claimed that eIF2A affects "possibly only one" (HKR1) mRNA. However, this was due to the reporter assay which is technically variable and could not allow some of the constructs to pass the authors' threshold. Alternative wording for this point should be considered.

      We agree and revised text in the DISCUSSION to read: “A possible limitation of our LUC reporter analysis in Fig. 3D was the lack of 3’UTR sequences of the cognate transcripts, which might be required to observe eIF2A dependence. Given that native mRNAs were examined in the orthogonal assay of polysome profiling in Fig. 3E, the positive results obtained there for SAG1 and SVL3 in addition to HKR1 should be given greater weight. Nevertheless, our findings indicate a very limited role of yeast eIF2A in providing a back-up mechanism for Met-tRNAi recruitment when eIF2 function is diminished by phosphorylation of its α-subunit.”

      1. For Figure 3D, it would be worth considering testing the #-marked genes (in Figure 3C) in this set up.

      Actually, we did test 10 of the 17 mRNAs marked with “#”s in the reporter assays of Fig. 3C, which had been noted in the Fig. 3C legend.

      1. In box plots, the authors should provide the statistical tests, at least where the authors explained in the main text.

      At the first occurrence of a notched box plot (Fig. 2D), we explained in the main text that in all such plots, when the notches of different boxes do not overlap, their median values differ significantly with a 95% confidence level. In cases where overlaps between notches is difficult to assess by eye, we added the results of Mann-Whitney U tests with the p values indicated by asterisks, as explained in the legends. We added results of additional Mann-Whitney U tests to such box plots in Figs. 3B, 6A-C, and 6-supp. 1E & G and mentioned this in the corresponding legends.

      Reviewer #2 (Recommendations For The Authors):

      The first section of "Yeast eIF2A does not play a prominent role as a functional substitute for eIF2 in the presence or absence of amino acid starvation" can be subdivided into a couple of sections for better readability.

      Done.

      Although the authors have used SM to induce ISR in yeasts previously, the validation of eIF2alpha phosphorylation in Western blot would be helpful for readers. Also, it should be worth testing whether eIF2alpha phosphorylation was properly induced in eIF2A KO cells.

      The translational induction of GCN4 mRNA, which we have documented in WT and eIF2A∆ cells, provides a quantitative read-out of eIF2 functional attenuation superior to determining the proportion of eIF2α that is phosphorylated.

      For Figure 2B, the Venn diagram that shows the overlap between TE-changes genes in WT_SM/WT and those in eIF2A∆_SM/eIF2A∆ would be helpful (although a list was provided by the source data).

      The Venn diagram has been provided in a new figure, Figure 2-figure supplement 1B.

      For Figures 1C and 5A-B, the depiction of the positions of uORFs within the orange gene region would be helpful for readers.

      Done.

      For Figure 4A-C, the depiction of the IRES regions (if known) within the orange gene region would be helpful for readers.

      Done for the URE2 IRES, whose location is known.

      For Figures 1C, 4A-C, and 5A-B, the y-axis should have a label/scale.

      Added.

      For Figure 3C, the definition of #-marked genes should be concretely described (e.g., value range) in the legend.

      Added.

      For Figure 3D-E, the statistical test has been only shown in a couple of data. A full depiction of the statistical results for all the data sets may be helpful for readers.

      We explained that when notches in box plots do not overlap, their medians differ with 95% confidence. In cases where overlaps were difficult to discern, we added p values from Mann-Whitney U tests to the relevant box plots.

      For Figure 3E, it would be helpful if the authors could show the UV spectrum of the sucrose density gradient to show the regions isolated for the experiments.

      Added for a representative replicate gradient in the new figure, Figure 3-figure supplement 1.

      Reviewer #3 (Public Review):

      We thank the referee for his/her positive assessment of our study.

      Weaknesses:

      While no role of eIF2A in translation initiation is apparent, the authors do not determine what function eIF2A does play in yeast. Whether it plays a role in regulating translation in a different stress response is not determined.

      We agree that there are many additional possibilities to consider for functions of eIF2A in translation initiation, including different stress situations or mutant backgrounds; however, we regard this as a limitation rather than a weakness in the experimental design and data obtained in the current study in which we examined the most likely possibilities for eIF2A function in yeast based on studies of the mammalian factor.

      Reviewer #3 (Recommendations For The Authors):

      Curiously, the authors indicate that they could not replicate published results for eIF2A's repressor function for URE2, PAB1, or GIC1 translation. This is a little concerning and one wonders if the yeast strain used in the previous study is different in some way from the authors' strain. Did the authors obtain that strain to test it in their assays?

      The same WT and eIF2A∆ strains have been analyzed here and in the two cited studies on yeast IRESs.

      The authors do discuss the fact that eIF2A may function to regulate translation in response to different stresses. It would have been a strength to test an alternative stress in the current study. However, I also appreciate that this could be the subject of a future study.

      Agreed.

      One minor question I have is whether the yeast strains used possess L-A dsRNA virus? While it may not be that this virus would necessarily mask a role of eIF2A-dependent translation, do the authors have any specific thoughts on this? Would different results be obtained if cured strains were used?

      According to Ravoityte et al. (doi: 10.3390/jof8040381), the S. cerevisiae strain we employed, BY4741, harbors L-A-1 dsRNA; however, we have not explored whether curing the virus would alter the consequences of eliminating eIF2A.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We thank the two reviewers for their constructive criticism, which helped to significantly improve our manuscript.

      During the revision process, we had to realize that the localization pattern reported for H. neptunium LmdCN-mCherry was an artifact caused by bleed-through of the BacA-YFP signal in the mCherry channel. More detailed studies showed that the fusion protein was detectable by Western blot analysis but, for unknown reasons, did not produce any fluorescence signal. Therefore, we have now removed the localization data shown in previous Figure 8B,C and Figure 8—figure supplement 1.

      To provide more evidence for a functional interaction between BacA and LmdC in H. neptunium, we have now established an inducible CRISPR interference system for this species and used it successfully to deplete LmdC (new Figure 9A-F). The loss of LmdC causes morphological defects very similar to those observed for the ΔbacA(D) mutant. In line with the physical interaction of BacA with the cytoplasmic region of LmdC observed in vitro, these findings support the hypothesis that the two proteins act in the same pathway. Consistent with the results obtained in H. neptunium, the absence of BacA leads to the delocalization of LmdC in R. rubrum. Moreover, we now provide in vivo evidence for a critical role of the cytoplasmic region of LmdC in the interaction of this protein with BacA in R. rubrum cells (new Figure 11). Together, these new findings strongly support the model that BacA and LmdC form a conserved morphogenetic module involved in the establishment of complex cell shapes in bacteria.

      Please see below for a more detailed explanation of our new results and for our response to the issues raised in the first round of review.

      Reviewer #1 (Public Review)

      In their study, Osorio-Valeriano and colleagues seek to understand how bacterial-specific polymerizing proteins called bactofilins contribute to morphogenesis. They do this primarily in the stalked budding bacterium Hyphomonas neptunium, with supporting work in a spiral-shaped bacterium, Rhodospirillum rubrum. Overall the study incorporates bacterial genetics and physiology, imaging, and biochemistry to explore the function of bactofilins and cell wall hydrolases that are frequently encoded together within an operon. They demonstrate an important, but not essential, function for BacA in morphogenesis of H. neptunium. Using biochemistry and imaging, they show that BacA can polymerize and that its localization in cells is dynamic and cell-cycle regulated. The authors then focus on lmdC, which encodes a putative M23 endopeptidase upstream of bacA in H. neptunium, and find that is essential for viability. The purified LmdC C-terminal domain could cleave E. coli peptidoglycan in vitro suggesting that it is a DD-endopeptidase. LmdC interacts directly with BacA in vitro and co-localizes with BacA in cells. To expand their observations, the authors then explore a related endopeptidase/ bactofilin pair in R. rubrum; those observations support a function for LmdC and BacA in R. rubrum morphogenesis as well.

      An overall strength of this study is the breadth and completeness of approaches used to assess bactofilin and endopeptidase function in cells and in vitro. The authors establish a clear function for BacA in morphogenesis in two bacterial systems, and demonstrate a physical relationship between BacA and the cell wall hydrolase LmdC that may be broadly conserved. The eventual model the authors favor for BacA regulation of morphogenesis in H. neptunium is that it serves as a diffusion barrier and limits movement of morphogenetic machinery like the elongasome into the elongating stalk and/or bud. However, there is no data presented here to address that model and the role of LmdC in H. neptunium morphogenesis remains unclear.

      We hypothesize that BacA establishes a barrier that prevents the movement of elongasome complexes into the stalk, either directly by sterical hindrance and/or indirectly by promoting the formation of an annular region of high positive inner cell curvature that cannot be passed by the elongasome. To test this model, we have now analyzed the localization dynamics of RodZ, a core structural component of the elongasome complex, in wild-type and ΔbacAD cells. We found that wild-type cells show dynamic YFP-RodZ foci whose movement is limited to the mother cell and the nascent bud, with no signal ob-served in the stalk. In ΔbacAD cells, by contrast, the fusion protein is consistently detected in all regions of the cell, including nascent stalks (new Figure 5). These results support the idea that BacA is required to confine the elongasome to the mother cell and bud regions and, thus, set the limits of the different growth zones in H. neptunium. We also attempted to follow the localization dynamics of other elongasome components, such as PBP2, MreC and MreD, but none of the corresponding fluorescent protein fusions was functional.

      In the past, we tried intensively to generate conditional mutants of lmdC, but all attempts to place the expression of this gene under the control of the copper- or zinc-inducible promoters available for H. neptunium were unsuccessful. To clarify the role of LmdC in H. neptunium morphogenesis, we have now established an inducible CRISPR interference system for this species and managed to block the ex-pression of lmdC using an sgRNA directed against the 5' region of its non-coding strand. We observed that cells lacking LmdC show a phenotype very similar to that of the ΔbacA mutant. Together with the finding that the N-terminal cytoplasmic region of LmdC physically interacts with BacA, this result strongly supports the hypothesis that BacA and LmdC act in the same pathway, forming a complex that ensures proper morphogenesis in H. neptunium (new Figure 9).

      The data presented illuminate aspects of bacterial morphogenesis and the physical and functional relationship between polymerizing proteins and cell wall enzymes in bacteria, a recurring theme in bacterial cell biology with a variety of underlying mechanisms. Bactofilins in particular are relatively recently discovered and any new insights into their functions and mechanisms of action are valuable. The findings presented here are likely to interest those studying bacterial morphogenesis, peptido-glycan, and cytoskeletal function.

      Reviewer #2 (Public Review):

      This is an excellent study. It starts with the identification of two bactofilins in H. neptunium, a demonstration of their important role for the determination of cell shape and discovery of an associated endopeptidase to provide a convincing model for how these two classes of proteins interact to control cell shape. This model is backed up by a quantitative characterisation of their properties using high-resolution imaging and image analysis methods.

      Overall, all evidence is very convincing and I do not have many recommendations on how to improve the manuscript.

      In my opinion, there are only two issues that I have with the paper:

      1. The single particle dynamics of BacA is presented as analysed and I would like to give some suggestions how to maybe extract even more information from the already acquired data:

      1.1. Presentation: Figure 5A is only showing projections of single particle time-lapse movies. To convince the reader that it was indeed possible to detect single molecules it would be helpful if the authors present individual snapshots and intensity traces. In case of single molecules these will show step wise bleaching.

      We have now added a supplementary video that shows both time series and intensity traces of individual BacA-YFP molecules (Figure 6—Video 1). It verifies the step-wise bleaching of the particles observed and thus shows that we observe the mobility of single molecules. Moreover, we have now included a supplementary figure that shows all trajectories identified within representative cells. This visualization provides a more comprehensive view of our data and further supports the notion that our analysis is based on the detection of single molecules.

      1.2. Analysis: Figure 5B and Supplement Figure 1 are showing the single particle tracking results, revealing that there are two populations of BacA-YFP in the cell. However, this data does not show if individual BacA particles transition between these two populations or not. A more detailed analysis of the existing data, where one can try to identify confinement events in single particle trajectories could be very revealing and help to understand the behaviour of BacA in more detail.

      We agree that an analysis of the single-molecule traces for transitions between the mobile and static states would help to achieve a more detailed understanding of the polymerization behavior of BacA. We believe that the dynamic formation, reorganization and disappearance of BacA-YFP foci observed by time-lapse analysis (Figure 4) indicates that BacA undergoes reversible polymerization in vivo. A deeper investigation of this aspect is beyond the scope of the present study and will be performed at a later point.

      1. The title of Fig. 3 says that BacA and BacD copolymerise, however, the data presented to confirm this conclusion is actually rather weak. First, the Alphafold prediction does not show the co-polymer, and second, the in vitro polymerisation experiments were only done with BacA in the absence of BacD. Accordingly, the only evidence that supports this is their colocalization in fluorescence microscopy. I suggest either weakening the statement or changing the title adds more evidence.

      To support the idea that BacA and BacD interact with each other, we have now added images of cells producing BacA-YFP or BacD-CFP individually (new Figure 3—figure supplement 1B,C). The results obtained show that Bac-YFP alone still forms filamentous structures, whereas BacD-CFP condenses into tight foci in the absence of its paralog. However, when produced together with BacA-YFP, the two proteins colocalize into filamentous structures, supporting the notion that they interact with each other. However, we agree that it is unclear whether BacA and BacD copolymerize into mixed protofilaments or whether they form distinct protofilaments that then interact laterally to form larger bundles. We have therefore replaced the term “co-polymerize” with “assemble” in the heading of this section.

      Finally, did the authors think about biochemical experiments to study the interaction between the cytoplasmic part of LmdC and the bactofilins? These could further support their model.

      We show the interaction between the cytoplasmic region of H. neptunium LmdC and BacA in Figure 9G,H (previously Figure 8D,E). For technical reasons, it was not possible to synthesize a peptide com-prising the corresponding region of R. rubrum LmdC, so that our in vitro analysis is limited to the H. neptunium proteins.

      To further support the notion that BacA interacts with the cytoplasmic region of LmdC, we have now analyzed the localization behavior of two LmdC variants with amino acid exchanges in the conserved cytoplasmic β-hairpin motif (new Figure 11). Both variants no longer colocalize with BacA and are no longer enriched at the inner cell curve. Interestingly, these exchanges also affect the enrichment of BacA at the inner cell curvature, suggesting that BacA needs to interact with LmdC for proper localization. It is tempting to speculate that BacA polymers have a preferred intrinsic curvature and that the activity of the BacA-LmdC complexes adjusts cell curvature in a manner that facilitates their association with the inner curve.

      Reviewer #1 (Recommendations for The Authors):

      We have the following specific recommendations for the improvement of the manuscript:

      1. Several places would benefit from additional quantitation of data:

      a. Figure 1 and supplements: can cell shape be quantified in a more specific way? (e.g. principle component analysis of shape as in https://onlinelibrary.wiley.com/doi/10.1111/mmi.13218). It looks as if BacD production may partially rescue the bacA shape phenotype?

      We have made considerable efforts to establish methods to quantify morphological changes and protein localization patterns in Hyphomonas neptunium. Since standard software packages, such as Oufti or MicrobeJ, are not able to reliably detect stalks and, thus, typically identify buds as separate cells, we have developed our own analysis software (BacStalk; Hartmann et al, 2020, Mol Microbiol), that is optimized for the detection of thin cellular extensions. However, while this software works very well with wild-type cells, it also fails to recognize amorphous cells with multiple, ill-defined extensions. Given these problems in cell segmentation, it is currently not possible to use principle component analysis to obtain a robust measure of the morphological defects of bactofilin mutants in H. neptunium.

      b. Figures 2-S2b, 7D and 9-S1b - can the area under the peaks be quantified and compared across strains? Visual examination of the spectra makes it difficult to discern differences.

      A direct comparison of the peak areas between strains is not possible, because the absolute values depend on the amount of peptidoglycan used in the muropeptide analyses. It is very difficult to precisely quantify peptidoglycan, which makes it challenging to use equal amounts of material from different strains in the reactions. However, the relative proportion of different muropeptide species, as provided in Figure 2—Dataset 1, faithfully reflects the composition of peptidoglycan and can easily compared between strains.

      c. Figure 9E,F, 9-S4d - BacA and LmdC localization in R. rubrum is very difficult to assess. It does not look linear/filamentous in most cells and is difficult to tell if it is associated with the inner curvature. Can you quantify the position of the signal along the short axis of the cell to better demonstrate that?

      We agree that a better quantification of the distribution of protein along the cell envelope of R. rubrum is required to support the conclusions drawn. To address this issue, we have now used line scans to measure the fluorescence intensities along the inner and outer curve of cells (n=200 per strain) and visualized the data in the form of demographs. The results clearly show an enrichment of BacA and LmdC at the inner curve in wild-type cells and a disruption of this pattern in various mutant backgrounds (new Figures 10F,G,J and 11D,E).

      1. Figure 2-S2A. Does ∆bacD grow better than wild-type? It would also be useful to add growth curves of the bacA complemented strains.

      In the case of H. neptunium growth curves are often misleading, because cells start to aggregate at the late exponential phase due to abundant EPS formation. The degree of cell aggregation also depends on the morphology of cells, because EPS production is limited to the mother cell body, which makes it challenging to compare morphologically distinct mutant strains. We have now performed growth assays for all H. neptunium deletion and complementation strains used in the study and limited the analysis of doubling times to the early and mid-exponential phase, in which cells do not yet form visible aggregates. The results obtained are now included in the new Figure 1F and Figure 1—figure supplement 2D. They show that the doubling times of the different bactofilin mutants are close to that of the wild-type strain.

      1. Figure 4BC: From the demographs provided, BacA and BacD appear to have different localization dynamics. BacD seems to stay at the base of the stalk, nearest the mother cell, whereas BacA migrates towards to bud? Also, "length" is misspelt in the panels.

      During the transition to bud formation, we indeed observe that the localization patterns of BacA and BacD are in many cases not fully superimposable, with BacD lagging behind BacA and forming transient additional clusters in the vicinity of the stalk base. Examples are now shown in Figure 4—figure supplement 4). This effect explains the distinct patterns in the demographs. We have now modified the text accordingly. We have also corrected the spelling of “length” in the figure.

      1. Can BacD polymerize on its own? It colocalizes with BacA in E. coli but that does not necessarily mean it co-polymerizes.

      Please see our response to a similar issue (point 2) raised by Reviewer #1.

      1. Lines 263-266. You use E. coli PG as a substrate for LmdC in vitro because "peptidoglycan from H. neptunium shows only a low degree of cross-linkage and hardly any pentapeptides." Does this not have relevance to the physiological significance of the observed activity? Or do you presume that LmdC activity (and/or that of other endopeptidases) is very high in H. neptunium so it is difficult to detect additional activity using HnPG as a substrate? It would be useful to clarify this logic in the text.

      DD-crosslinks are formed by all major peptidoglycan biosynthetic complexes, including the elongasome and the divisome, so that their general relevance to cell growth in H. neptunium is beyond doubt. The low degree of crosslinkage observed suggests that H. neptunium contains high endopeptidase activity, which cleaves crosslinks after their formation by DD-transpeptidases. We have now added the explanation “likely due to a high level of autolytic activity” to make this point clearer. Whether LmdC makes a major contribution to the low level of crosslinkage remains to be determined. However, our data suggest that it mostly acts in complex with BacA, so that it may only cleave peptidoglycan locally and not have a global effect global on cell wall composition. It would not possible to detect the DD-endopeptidase activity of LmdC using H. neptunium peptidoglycan as a substrate, because it has a low content of DD-linked peptide chains. To facilitate the in vitro activity assay, we therefore used highly crosslinked peptidoglycan from a mutant E. coli strain.

      1. Lines 268-269: Is there some explanation for why monomers do not increase on LmdC treatment? Here quantitation of peaks before and after treatment would allow the reader to more precisely interpret these data.

      The absolute peak sizes are not comparable, because there is some variation in the amount of peptido-glycan included in the assays (see also our comments on point 1b raised by Reviewer #1) and the integrated peak areas (which correspond to the amounts of muropeptide species produced) depend on both the height and the width of the peaks, which vary to some degree in different HPLC runs. The relevant measure to compare the muropeptide profiles is therefore the relative content of different muropeptide species in the different conditions. For clarification, we have now added the following sentence to the legend of Figure 8D: “A quantification of the relative abundance of different muropeptide species in each condition, based on a comparison of the relative integrated peak areas, is provided in Figure 8—Dataset 1.” The control reaction lacking LmdC only contains peptidoglycan diluted in buffer and thus provides insight into muropeptide composition of untreated peptidoglycan.

      1. Lines 280-283: It would be interesting to know if the transmembrane domain of LmdC is required for its localization since it is dispensable for binding BacA and since LmdC still localizes to foci without BacA.

      Given that it is currently not possible to localize LmdC in H. neptunium, we were not able to perform this analysis.

      1. Line 296: it is also possible that LmdC localizes with another protein and does not independently assemble into larger complexes.

      Since the localization pattern reported for LmdC in the ΔbacAD background is no longer valid, we have not discussed this aspect in the revised version of our manuscript. However, in general, we do not exclude the possibility that LmdC could interact with other peptidoglycan biosynthetic proteins.

      1. Line 304-306 and Fig 9: Is the domain organization of RrLmdC the same as for HnLmdC? It would be useful to include its domain organization as well. Also, please add amino acid numbering to Figure 9B.

      We have now added a schematic showing the domain organization of LmdC from R. rubrum (new Figure 10B). The protein is highly similar to its homolog from H. neptunium.

      1. Line 340-341: "In both cases, they functionally interact with LmdC-type DD-endopeptidases to promote local changes in the pattern of peptidoglycan biosynthesis." This conclusion is not experimentally supported. Since LmdC is essential and you could not make a depletion strain in H. neptunium, it was not shown that the interaction with LmdC is how BacA promotes changes in PG patterning. HADA/FDAA labeling was not performed in R. rubrum, and no global changes in PG chemistry were observed in bacA or lmdC mutants, so you cannot claim BacA or LmdC influences PG patterning there, either. Either soften this statement to a hypothesis or otherwise rephrase.

      To further corroborate a functional interaction between BacA and LmdC, we have now established an inducible CRISPRi system to deplete LmdC from H. neptunium cells (see also our comments on the public review of Reviewer #1). We observe that the loss of LmdC leads to a phenotype very similar to that observed for the ΔbacA(D) mutant, supporting the idea that BacA and LmdC act in the same path-way. We have now also performed localization studies of the elongasome component RodZ in H. nep-tunium, which demonstrate that the spatial distribution of elongasome complexes is affected in the absence of the bactofilin cytoskeleton in H. neptunium. Combined with the observation that LmdC is a catalytically active DD-endopeptidase and its absence leads to morphological defects, these results indicate that BacA, together with LmdC, induces local changes in pattern of peptidoglycan biosynthesis, both by affecting elongasome movement and, likely, by reducing peptidoglycan crosslinking in the cell envelope regions it occupies.

      1. Figure 9-S4: there is no panel C (change D to C).

      Corrected.

      1. Lines 344-355: No data is presented here to support the barrier model of bactofilin function. In addition, it is unclear why cells would take on amorphous shapes instead of extended rod shapes/filaments if elongasome function was not constrained on the longitudinal axis. It would be helpful to have more discussion of the potential mechanisms of LmdC function in H. neptunium in this section of the discussion since that is the emphasis of the results section.

      To support the barrier model, we have now compared the localization dynamics of the elongasome component RodZ in wild-type and ΔbacAD cells. The results show that RodZ is excluded from the stalk in the wild-type background, whereas it readily enters the stalk in the mutant cells, leading to the expansion of stalks into large, amorphous extensions. Consistent with these findings, HADA labeling is not observed within the stalks in wild-type cells, whereas it is readily observed in the enlarged stalk structures (pseudohyphae) formed in the mutant cells.

      The current model of MreB movement suggests that MreB filaments have an intrinsic curvature and thus preferentially align along regions of similar curvature, which is along the circumference of the cell in rod-shaped geometries. However, previous work has shown that MreB starts to move along randomly oriented trajectories as soon as cells lose their rod-shaped morphology and adopt more spherical shapes (Hussain et al, 2018, eLife). In line with these findings, our current and our previous work (Cserti et al, 2017, Mol Microbiol) indicate that the expansion of the ovoid H. neptunium mother cell prior to the onset of stalk biosynthesis as well as bud formation are mediated by the elongasome complex. Thus, the elongasome can clearly also give rise to shapes other than rods. Interestingly, however, the H. neptunium elongasome also appears to drive the formation of the rod-shaped stalk, possibly by moving around the circumference of the stalk base. Thus, species- or growth phase-dependent regulatory mechanisms or, potentially, differences in the spatial arrangement of the glycan strands within the peptido-glycan layer may result in different modes of elongasome movement and, thus, modulate the morphogenetic activity of elongasome complexes.

      1. Lines 395-397: It is also possible that LmdC positioning is dependent on cell morphology, rather than directly on BacA, since morphology is so distorted in bacA mutant cells.

      We provide several lines of evidence showing that LmdC and BacA functionally and physically interact (see above), making it highly unlikely that the two proteins are not associated with each other. How-ever, our previous (Figure 10I,J) and new (Figure 11) results suggest that the physical interaction with LmdC and/or or the cell shape-modulating activity of the complex are required for the proper localization of BacA at the inner curve of the cell. This finding may indicate the existence of a self-reinforcing cycle, in which the morphological changes induced by BacA-LmdC assemblies stimulate the recruitment of additional assemblies to their site of action.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study presents useful findings regarding the impact of forest cover and fragmentation on the prevalence of malaria in non-human primates. The evidence supporting the claims of the authors is, however, incomplete, as the sampling design cannot adequately address the geospatial issues that this study focuses on.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study as a concept is well designed, although there is still one issue I see in the methodology.

      I still have concerns with their attempts to combine the different scales of data. While the use of point data is great, it limits the sample size, and they have included the district to country level data to try and increase the sample size. The problem is that although they try to get an overall estimate at the district/state/country by taking 10 random sample points, which could be a method to get an estimate for the district/state/country. It would be a suitable method if the primates were evenly distributed across the district/state/country. The reality is that the primates are not evenly distributed across the district/state/country therefore the random point sampling is not a reasonable method to get an estimate of the environmental variables in relation to the macaques. For example if you had a mountainous country and you took 10 random points to estimate altitude, you would end up with a large number, but if all the animals of interest lived on the coast, your average altitude is meaningless in relation to the animals of interest as they are all living at low altitude. The fact that the model relies less on highly variable components and places more reliance on less variable components, is really not relevant as the district/state/country measurements have no real meaning in relation to the distribution of masques.

      A simple possible way forward could be to run the model without the district/state/country samples and see what the outcome is. If the outcome is similar then the random point method may be viable (but if it gives the same outcome as ignoring those samples then you don't need the district/state/country samples). If you get a totally different outcome then it should raise concerns about using the district/state/country samples.

      This paper is a really nice piece of work and is a valuable contribution but the district/state/country sample issue really needs to be addressed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A simple possible way forward could be to run the model without the district/state/country samples and see what the outcome is. If the outcome is similar then the random point method may be viable (but if it gives the same outcome as ignoring those samples then you don't need the district/state/country samples). If you get a totally different outcome then it should raise concerns about using the district/state/country samples.

      Thank you for your comments, and for the suggestions to address the issues identified in your main commentary by running an analysis on exclusively GPS geolocated data points. This was the original plan for analysis, but the available data identified in the literature review includes only 14 data points (macaque P. knowlesi prevalence surveys) with associated GPS coordinates. This was found to be too limited to obtain meaningful results from a regression analysis, and hence we then explored methods for utilising all available data to identify trends whilst accounting for spatial uncertainty in the analysis. As the point location only represents the location of capture and not the extent of the home range of the NHPs, we additionally feel there is value in exploring methods to encompass the wider surrounding habitat.

      We do appreciate the concerns you raise with the random point method being used to represent macaque survey sites when species of interest are not necessarily evenly distributed across an area. To investigate this, we ran sensitivity analysis on a subset of the dataset according to whether the points fall in areas of >50%, >75% or >90% predicted probability of macaque occurrence, with maps derived from published models of macaque suitability in Southeast Asia. For each of these thresholds, points that fall outside these areas were removed – such that, if a random point is located on a mountain range where there is 0 likelihood of macaque occurrence, it is excluded from the analysis. We found that restricting analysis to areas with highly probably macaque habitat still shows a robust effect of forest cover on NHP prevalence, and additionally that for the most conservative (>90%) habitat threshold there remains an effect of forest fragmentation on prevalence (SI Table S17c, Figure S15c). Given that using the full data set increases the uncertainty, as there is more variation in covariates between the replicates, this can be considered a more conservative approach to detecting an effect of environment as reported in the main findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1. A more thorough analysis of transition boundaries between different types of patterns would further strengthen the conclusions.

      We agree that the transition between different patterning regimes should be discussed more quantitatively in the manuscript. Specifically, we identified a highly sensitive parameter range where the disorder in the patterns rapidly increases as a function of the VEGF stimulus. We have improved our discussion of the transition between ‘orderedlike’ patterns and ‘disordered-like’ patterns in the main text as follows: “At relatively low VEGF levels, the patterns were mostly ordered, with small deviations from the expected ‘salt and paper’ geometry with a 25%-75% ratio of TipStalk (Fig. 2D). However, as the VEGF input increased, the fraction of Tips grew and the patterns became sharply more disordered over a relatively narrow range of magnitude of the VEGF input, which could be identified as a highly sensitive area separating more ‘ordered-like’ and ‘disordered-like’ patterns. Finally, increasing VEGF stimuli beyond the highly sensitive area further increased the disorder of the patterns, but with a lower VEGF sensitivity, over several more orders of magnitude of VEGF inputs”.

      Reviewer #2 (Recommendations For The Authors):

      Please refer to the Public Comments above for a broad review. Below, I provide specific concerns that could be addressed.

      Main comments

      1. Is the salt-and-pepper model observed for the case when there is no VEGF in the experiments? It would be good to confirm the same. If not, the analysis presented in Fig. 3 could be performed for this case and used as a baseline while referring to the data in Fig. 3.

      We thank the referee for the interesting suggestion. The pattern predicted by the model is not strictly salt-and-pepper in absence of VEGF, but the disorder quantified in terms of “incorrect” contacts between Tip cells is considerably lower (see for example the disorder quantification in supplementary figure 1C). We have included the Tip-Tip contact statistics for a case of VEGF=1 ng/ml (100-fold lower that the level used in Fig. 3 compare between model and experiment). In this case, there is clearly more spacing between Tip cells, thus demonstrating how high VEGF stimuli increase the probability of contacts between Tip cells. In the main text, we commented: “As a baseline comparison, the mathematical model with a 100-fold reduction of VEGF stimulus (1 ng/ml) exhibited a Tip-Tip distance statistics more closely comparable with the ‘salt-and-pepper’ model”.

      1. The authors mention in the Discussion (end of pg. 7) that ...a low level of exogeneous VEGF is essential to induce Delta-NOTCH signalling.. However, in the standard NOTCH signalling (Boareto et al.), we can get the salt-and-pepper pattern without any VEGF. Am I missing something? The authors may want to take a re-look.

      We appreciate the referee’s understanding of the mathematical model. The model used here still exhibits a bistable behavior between the low-Delta and high-Delta cell states even in the absence of VEGF input, as seen for example in the cell state distribution of Fig. 2B, and in agreement with the original model by Boareto et al. This behavior is reflective of the more general applicability of the model, as it describes Delta-NOTCH interactions in various systems. For endothelial cells, VEGF is indeed required to trigger this interaction, but this was not the primary focus of the paper, hence the original model was used. In the text referred to by the reviewer, we are discussing the role,of VEGF based in its known biological effects as well as modeling results. We anticipate that the future further adaptation of the model to,endothelial cells will refine its description of of cell interactions in the absence of VEGF.

      1. The size of cells (or spacing between cell nuclei) is highly variable (Fig. 3). Since it is known that the size of cell-cell junctions influences signalling, it would good to at least comment on the same, considering that the model in the paper consists of regular static hexagons. Similarly, it seems desirable to comment on expressing the distance between Tip cells (Fig. 3) in cell length units, when the cell lengths are so variable.

      We concur with the suggestion that our consideration of the cell-cell contact size in NOTCH signaling should be clarified in the manuscript.

      Sprinzak et al. reported in their 2017 article published in Developmental Cell that the cell-cell contact area does influence NOTCH Signaling. In this article, they found that NOTCH trans-endocytosis (TEC) for pairs with a larger contact width (25µm) is up to five times higher than for pairs with a smaller contact (2.5µm), as observed through the two-cell TEC assay. While TEC correlates with contact width across a range from 1 to 40µm, the values fluctuate significantly in the middle range, particularly when excluding extremely low cell-cell contact areas.

      In our experiments, we observed that the cell-cell contact area ranges from essentially infinitesimal corner-to-corner contact to roughly 50µm. We excluded the corner contacts, which might correspond to extremely low cell-cell contact areas, from the Tip-Tip distance measurements as depicted in Fig. 3B. We also made the assumption that variations in cell-cell contact size within tens of microns correlate weakly with the strength of NOTCH signaling. This assumption did not impede our effort to compare the overall trends with results from modeling using hexagonal cells, as shown in Figs 6 D&E. We have included this comment and the corresponding reference to elucidate our assumption in the results as follows: In our experiments, the observed cell-cell contact area varied, spanning from very low (cell corner-to-corner contact) up to approximately 50µm. Previous studies(14, 15) have clearly demonstrated the influence of the cell-cell contact area on NOTCH Signaling, but the values get nosy in the middle range, particularly when excluding extremely low cell-cell contact areas. Reflecting these findings, we excluded the corner contacts, which might correspond to extremely low cell-cell contact areas, from the Tip-Tip distance measurements as depicted in Fig. 3B. We also made an assumption that variations in cell-cell contact size within tens of microns correlate weakly with the strength of NOTCH signaling. This assumption did not impede our effort to compare the overall trends with results from modeling using hexagonal cells, as shown in Figs 3 D&E.

      1. The results presented in Fig. 6J are quite striking. However, the number of samples N = 10 and N = 11 seem somewhat low. How does one justify that the findings are not influenced by low number fluctuations?

      We acknowledge the reviewer's concerns regarding potential biases stemming from a limited number of samples. The analysis presented in Fig. 6J was specifically designed to complement and support the findings in Fig. 6H. In this context, the counts of sprout and mini-sprout dots correspond to the number of instances "including a sprout" and "including a mini-sprout."

      While the counts of sprouts and mini-sprouts in Fig. 6H might seem limited as highlighted by the reviewer, the statistical difference between the two groups was found to be significant. Nevertheless, we expanded our regions of interest to encompass neighboring cells, based on the rationale that the local environment might have closely interacting and similar features. The sample sizes in Figure 6J, represented as N=10 and N=11, equate to an examination of 70 cells and 77 cells, respectively. For instance, in the category "including a sprout," five out of ten groups indicated that all seven neighboring cells in a group exhibited fibronectin levels exceeding a given threshold, translating to 35 cells with fibronectin levels above this threshold. Given that the observed trends in distribution were consistently reasonable across the examinations of both 70 and 77 cells, we would like to state that we are confident in our results.

      1. It is written towards the end on pg. 5 that ... although all sprouts indeed formed from mini-sprouts, not all .... However, as can be seen from Fig. 4O, Sprouts can also be generated from Stalk cells. This should be corrected.

      Thank you for highlighting the discrepancy between our statement on page 5 and the observations in Fig. 4O. While all sprouts undergo a mini-sprout phase, the transition from Stalk to mini-sprout is not always be observed due to the limitations of our observational timeframe. We acknowledge this oversight and adjusted our statement to clarify that sprouts appearing to form directly from Stalks likely passed through an unobserved intermediate mini-sprout stage as follows: We found that all sprouts formed either directly from Stalks or from mini-sprouts, suggesting a non-observed transition from Stalk to mini-sprout due to observational timeframe limitations. Strikingly, however, not all minisprouts persisted and initiated sprout formation.

      1. No solid blue bars are shown in Fig. S2A as mentioned in the caption. Kindly correct.

      We apologize for the mistake. We have corrected the figure to show the blue bars depicting the experimental measurements for sprout distance probability.

      1. How are the high-Delta cells or high-NOTCH cells decided in experiments or simulations? Does it happen that Delta and NOTCH levels are comparable? In that case, what is done? This point could be clarified in the main manuscript or Materials and Methods.

      We agree with the reviewer that Tip cell definition should be clarified. In the model, we define a threshold level for cellular Delta to distinguish Tip and Stalk cells, which is now explained in the Methods section “Definition of Tip cells in the model”. As elaborated in the new section, Delta and NOTCH levels are never comparable due to the circuit’s bistable behavior. In experiments, Tip cells based on their key phenotypic characteristic — invasive migration into the surrounding collagen matrix rather than Delta or NOTCH levels. The details can be found in “Precise quantification of Tip cell spatial arrangement suggests disordered patterning in the engineered angiogenesis model” section and Figure 3A.

      Minor comments

      There are a good number of typos in the paper. The manuscript should be carefully checked and corrected for the same. Below, I provide a few instances.

      1. In the abstract towards the end, it should be "understanding" instead of "understating"

      2. On pg. 5, just before the beginning of the last paragraph, there is a typo "parodied" which should most likely be "provided"

      3. First paragraph on pg. 6 typo "spouts" instead of "Sprouts"

      4. Second paragraph on pg. 6, correctly write "testS"

      5. Near the beginning of pg. 8, should be "C. elegans" instead of "C. elegance"

      6. Figure 1 caption, towards the end, should be "Stalk" instead of "Salk"

      We sincerely appreciate your keen attention to detail. we have thoroughly reviewed the manuscript and made the necessary corrections, including those that you have highlighted.

      Reviewer #3 (Recommendations For The Authors):

      Major concern:

      The authors should discuss in more detail how their work can be used for a better understanding of the angiogenesis process in physiological conditions and in pathological conditions such as post-ischemic revascularization or tumor vascularization.

      We have included comments and the corresponding references to clarify the aspect the reviewer suggested: The results in this study can further inform our understanding of angiogenesis in physiological and pathophysiological conditions. In particular, in many circumstances, the levels of VEGF is determined by the degree of hypoxia, which can be highly elevated following oxygen supply interruption, e.g., in wound healing or ischemia, or due to progression of neoplastic growth. Our results suggest that in these cases, formation of sprouts can be dysregulated due to higher incidences of co-localizations of prospective Tip cells. In addition, since these conditions are frequently accompanied by altered synthesis of ECM, the sprout density can increase, which may lead to formation of denser and less developed vascular beds frequently observed as a result of tumor angiogenesis(42, 43). Our results thus suggest that the disorder and higher plasticity of the endothelial cell fate speciation at higher VEGF inputs can be a key contributor to some pathological states associated with persistently hypoxic conditions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This article by Zhai et al, investigates sterol transport in bacteria. Synthesis of sterols is rare in bacteria but occurs in some, such as M capsulatus where the sterols are found primarily in the outer membrane. In a previous paper the authors discovered an operon consisting of five genes, with two of these genes encoding demethylases involved in sterol demethylation. In this manuscript, the authors set out to investigate the functions of the other three genes in the operon. Interestingly, through a bioinformatic analysis, they show that they are an inner membrane transporter of the RND family, a periplasmic binding protein, and an outer membrane-associated protein, all potentially involved with lipid transport, so providing a means of transporting the lipids to the outer membrane. These proteins are then extensively investigated through lipid pulldowns, binding analysis on all three, and X-ray crystallography and docking of the latter two.

      Strengths

      The lipid pulldowns and associated MST binding analysis are convincing, clearly showing that sterols are able to bind to these proteins. The structures of BstB and BstC are high resolution with excellent maps that allow docking studies to be carried out. These structures are distinct from sterol-binding proteins in eukaryotes.

      We thank the reviewer for their favorable impression of this work.

      Weaknesses

      While the docking and molecular dynamics studies are consistent with the binding of sterols to BstB and BstC, this is not backed up particularly well. The MST results of mutants in the binding pocket of BstB have relatively little effect, and while I agree with the authors this may be because of the extensive hydrophobic interactions that the ligand makes with the protein, it is difficult to make any firm conclusions about binding.

      We agree with the reviewer that at this point, there is no experimental evidence to define the sterol binding site in BstB. While in the manuscript we allude to the extensive hydrophobic interactions as being especially stabilizing and difficult to eliminate with one or two mutations, we are now also aware that hydrogen-bonding interactions with the polar head of the sterols are quite important (see data on BstC, where disruption of that interaction significantly reduces the equilibrium affinity for sterols). Our MD simulations show that at least 3 protein amino acids can participate in H-bonding with the sterols. Moreover, recent work from our lab show that even ligand site waters can extend an H-bonding network around the polar head of the lipid (Zhai et al., ChemBioChem 2023, 24, e202300156), thereby enabling H-bonding with amino acids that are further away from the ligand site. It is therefore difficult to predict which mutations will sufficiently destabilize the binding. While this question is one we will tackle in future studies focused on obtaining high-resolution substrate-bound structures of BstB or homologs, the findings reported here are still relevant and timely, and we posit will spur the discovery of functional homologs, including some in organisms that are more tractable.

      The authors also discuss the possibility of a secondary binding site in BstB based on a slight cavity in domain B next to a flexible loop. This is not backed up in any way and seems unlikely.

      The reviewer is correct in that the evidence for this second binding site weak. While the crystallographic structure shows a highly hydrophobic region and the binding studies suggests cooperativity exists in the binding of the 4methylsterol substrate, the docking studies do not strongly support binding at that site. As such, we have clarified in the manuscript that a second hydrophobic cavity is observed, but that its role in ligand interaction remains unexplored.

      Reviewer #2 (Public Review):

      Summary:

      In eukaryotes, sterols are crucial for signaling and regulating membrane fluidity, however, the mechanism governing cholesterol production and transport across the cell membrane in bacteria remains enigmatic. The manuscript by Zhai et al. sheds light on this topic by uncovering three potential cholesterol transport proteins. Through comprehensive bioinformatics analysis, the authors identified three genes bstA, bstB, and bstC encoding proteins which share homology with transporters, periplasmic binding proteins, and periplasmic components superfamily, respectively. Furthermore, the authors confirmed the specific interaction between these three proteins and C-4 methylated sterols and determined the structures of BstB and BstC. Combining these structural insights with molecular dynamics simulation, they postulated several plausible substrate binding sites within each protein.

      Strengths:

      The authors have identified 3 proteins that seem likely to be involved in sterol transport between the inner and outer membrane. The structures are of high quality, and the sterol binding experiments support a role for these proteins in sterol transport.

      We thank the reviewer for this positive view of our work.

      Weaknesses:

      While the author's model is very plausible, direct evidence for a role of BstABC in transport, or that the 3 proteins function together in a single pathway, is limited.

      The reviewer is correct that we were unable to demonstrate that the three proteins work together to transport 4methylsterols. This is not for lack of trying. We first attempted gene deletion studies, and as mentioned in the manuscript (with more details now provided in the experimental section), this appeared to be lethal. We then attempted in vitro exchange experiments, in which the proteins would be used to transfer sterols from sterol-loaded “heavy” liposomes to a sterol-free “light” liposomes – such exchange assays are frequently performed with eukaryotic sterol transporters (see Chung et al., Science 2015, https://doi.org/10.1126/science.aab1370). These assays were not successful because 1) sterols incorporated poorly into liposomes made with E. coli polar lipids and yielded leaky liposomes; 2) use of liposomes prepared with the TLE of M. capsulatus proved more stable, but no appreciable exchange was observed; we reasoned that this might be due to the absence of an energy source for BstA, the RND component for which we have expressed and purified only the soluble periplasmic domain. Given the technical difficulty of these in vitro transport experiments, we will continue to pursue in vivo demonstration of function as new homologs are identified.

      Reviewer #3 (Public Review):

      Summary:

      The work in this manuscript builds on prior efforts by this team to understand how sterols are biosynthesized and utilized in bacteria. The study reports a new function for three genes encoded near sterol biosynthesis enzymes, suggesting the resulting proteins function as a sterol transport system. Biochemical and structural characterization of the two soluble components of the pathway establishes that both proteins can bind sterols, with a preference for 4methylated derivatives. High-resolution x-ray structures of the apoproteins reveal hydrophobic cavities of the appropriate size to accommodate these substrates. Docking and molecular dynamics simulations confirm this observation and provide specific insights into residues involved in substrate binding.

      Strengths:

      The manuscript is comprehensive and well-written. The annotation of a new function in a set of proteins related to bacterial sterol usage is exciting and likely to enable further study of this phenomenon - which is currently not well understood. The work also has implications for improving our understanding of lipid usage in general among bacterial organisms.

      We thank the reviewer for this synopsis of our work.

      Weaknesses:

      The authors might consider moving some of the bioinformatics figures to the main text, given how much space is devoted to this topic in the results section.

      We have taken this advice and moved Figure S1 to the main manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1. In the analysis of the MST data, the authors quote Hill coefficients. How reliable are these numbers? For BstB, for instance, it seems unlikely that more than one molecule would bind. Can the analysis be done without needing to include Hill coefficients?

      We used fits that did and did not invoke cooperativity – see below. We are certain that both BstA and BstB are better fit with cooperativity invoked.

      Author response image 1.

      1. In looking at the maps associated with the structures, which were included in the review package, I see that two citric acid molecules fit beautifully into the density where currently PEG has been modelled. This needs to be fixed and some comments may be appropriate in the manuscript.

      We thank the reviewer for calling our attention to this. Citric acid has now been added to the model, and we reason that these are present in the structure because citric acid was used in the crystallization condition. The revised model is now present in the PDB.

      1. It is not necessary to show the two molecules in the asymmetric unit in Figure 4 given that it is not a dimer. This doesn't add anything to the manuscript.

      We now show a single molecule of BstC in Figure 4 (now Figure 5).

      1. I wouldn't consider the loops shown in Figure S4 as disordered. They have slightly higher B-values but are not completely mobile.

      We did not refer to these loops as disordered. In the text, we say they “exhibit poor electron densities, suggesting conformational sampling of more than one state (Fig. S4A).”

      Reviewer #2 (Recommendations For The Authors):

      pg 7, "hinting at an astounding distinction": I might suggest a word other than astounding that conveys how statistically unlikely, unusual, etc. this result is.

      Thank you – we have removed “astounding”.

      pg 7, paragraph 2: Here the authors show that in the SSN analysis, BstB proteins cluster separately and suggest this implies a distinction in function. However, they also show that PhnD homologs do not cluster separately (distributed across multiple clusters), yet presumably have similar functions. I am not familiar with SSN, but it seems to me that the second statement about PhnD implies that the first statement about BstB might not be valid, i.e., if PhnD doesn't cluster based on function, on what basis can we conclude that BstB does? On what basis does clustering occur in the SSN analysis? Might it be driven by things other than function? This comment also concerns the final paragraph of this section.

      The reviewer is correct in that PhnD homologs occupy separate clusters of the SSN. Many of these homologs were crystallized with phosphate-like compounds, but it is possible that they have non-overlapping substrate scopes and are therefore functionally distinct. As for the basis of clustering, the SSN is fully sequence-based. What has been observed is that proteins with highly similar sequences can have similar functions – but this is not always true.

      pg 8, paragraph 1: The authors suggest that BstABC may be essential. This is probably not a critical claim and it might be simplest to just remove it, but if it is mentioned, the authors should probably explain what was attempted that failed, so a reader can assess the strength of the evidence supporting essentiality. For example, I don't see anything in the methods about genetic manipulations of M. capsulatus, so currently, this falls within the realm of "Data not shown".

      We have provided additional information about the experimental techniques used to do this. This statement was included so that it is understood that the reason for the experimental failure is unlikely to be technical in nature, as we have successfully deleted some sterol related genes while others remain intractable.

      Fig. 2A: It is unclear to me what is being plotted here, perhaps more experimental detail is required in the form of labels and/or legend. Is this a quantification of each sterol in each fraction separated by GC? There are essentially no methods provided for the GC-MS experiments. A reference is provided, but I think providing detailed methods for these specific experiments will provide a higher degree of scientific rigor. I am not sure what is standard for GCMS, but perhaps showing spectra in the supplement that establish the identity of the bound molecules as species I and II would be appropriate?

      Additional experimental details have been provided and the figure legend changed to be more clear. Moreover, we now clearly state that the chromatograms shown were used to identify lipids due to retention times for spectra that were previously published in Wei et al., 2016.

      pg 10-11, comparison with PhnD structure: Perhaps it is worth mentioning a 3rd possible explanation for the relative opening/closing of the cleft is simply crystal packing? I don't think it necessarily has to imply anything about a difference in function. Also, the focus seems to be on this pairwise comparison, but perhaps more insights could be gleaned from an analysis that included a wider range of homologs, especially if any are thought to bind hydrophobic substrates.

      This could be true, and we have included a statement to that effect. We are unaware of homologs shown to bind to large, hydrophobic molecules.

      I think that BstB is shown upside-down in sup movies relative to other figures. If it isn't changed, perhaps adding some labels would help orient the reader.

      We have rotated the movies to be more consistent with the figures.

      Fig. S7: No units are indicated for Kds (uM?).

      Thank you – this has been fixed.

      pg 11, paragraph 2. "adjacent to three residues: Glu118, Tyr120 and Asn192": The residue number used in the text doesn't seem to match the numbering in the PDB file. I think these residues correspond to Glu98, Tyr100, and Asn172 in the PDB file.

      We regret this error. The correct numbering for both structures is now present in the deposited PDB files (7T1M for BstB and 7T1S for BstC).

      pg 12, final paragraph: The authors present binding data for BstB variants with mutations in the putative sterol binding pocket identified in the structural and MD analyses. However, these mutants had no effect on binding. The authors rationalize this in terms of the size of the interface and hydrophobic nature (which indeed, may be correct and is very plausible), and it is worth noting that many of their mutations are to Ala and would largely preserve the hydrophobic nature of the cleft. However, these mutants raise questions about where sterols actually bind. No experimental evidence is presented that substrates bind in the cleft, it is only hypothesized based on structural homology, MD simulations, etc. These mutations formally provide evidence against the hypothesis being tested; I think that has to be discussed a bit more directly, alongside the caveats the authors already discuss about hydrophobicity, etc.

      This is a valid point by the reviewer, and it is one we have attempted to address with our statement in the manuscript and in our response to reviewer 1. We have modified the relevant text to more clearly state that there is as of yet no experimental evidence for the binding of sterols to the cavity identified via molecular docking.

      pg 13: Presumably this is not the full-length lipoprotein, but has been truncated/mutated in some way? Some statement of roughly what was purified/crystallized should be stated.

      The SI methods on protein purification states that the genes of BstB and BstC without their respective signal peptides were obtained.

      pg 13, last paragraph "TN1 exhibits hybrid hydrophobicity, with the sides horizontal to cavities being hydrophobic while the vertical sides are more hydrophilic". I don't really follow the horizontal vs vertical sides. Perhaps this could be described in a different way.

      Noted and changed to “TN1 is closer to the N-terminal face of the structure, while CA1 and CA2 are proximal to the C-terminal face and form two open hydrophobic pockets; TN1 exhibits a mixture of hydrophobic and hydrophilic amino acids (Fig. 4B and Fig. S9B, Table S4).”

      pg 15-16, "Comparison to eukaryotic sterol transporters": Perhaps this would be better suited for the discussion section? Could also be streamlined; it is mostly discussing and comparing eukaryotic sterol binding domains to each other, not to BstABC.

      Given that BstB and BstC are the first identified proteins (and putative transporters) for bacterial sterol engagement, we thought a careful description of the existing sterol transporters (which are all eukaryotic) was warranted.

      Reviewer #3 (Recommendations For The Authors):

      I have just two minor suggestions for the authors if they wish to comment on or address them.

      1. Do the three proteins (BstA/B/C) form any sort of complex? Perhaps this property was not assessed - but it seemed possible that the B and C components might constitute a shuttle for the membrane-bound transporter?

      This is an important observation – the unliganded version of these proteins show no appreciable affinity for each other. However, BstB (which would be expected to engage both with BstA and BstC) belongs to a family of proteins known to undergo significant conformational change upon substrate binding. It is possible that with substrate present, complexes are formed – we have yet to investigate this.

      1. In Figure S1, panel C - it appears that the label for the BstC cluster may have migrated away from the intended location. In this figure, it might also be useful to indicate in the caption the meaning of the red coloring of the nodes?

      The label is now fixed – thank you for drawing our attention to this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the three reviewers and the reviewing editor for their positive evaluation of our manuscript. We particularly appreciate that they unanimously consider our work as “important contributions to the understanding of how the CAF-1 complex works”, “The large amounts of data provided in the paper support the authors' conclusion very well” and “The paper effectively addresses its primary objective and is strong”. We also thank them for a careful reading and useful comments to improve the manuscript. We have built on these comments to provide an improved version of the manuscript, and address them point by point below .

      Reviewer #1 (Public Review):

      Summary:

      This paper makes important contributions to the structural analysis of the DNA replication-linked nucleosome assembly machine termed Chromatin Assembly Factor-1 (CAF-1). The authors focus on the interplay of domains that bind DNA, histones, and replication clamp protein PCNA.

      Strengths:

      The authors analyze soluble complexes containing full-length versions of all three fission yeast CAF-1 subunits, an important accomplishment given that many previous structural and biophysical studies have focused on truncated complexes. New data here supports previous experiments indicating that the KER domain is a long alpha helix that binds DNA. Via NMR, the authors discover structural changes at the histone binding site, defined here with high resolution. Most strikingly, the experiments here show that for the S. pombe CAF-1 complex, the WHD domain at the C-terminus of the large subunit lacks DNA binding activity observed in the human and budding yeast homologs, indicating a surprising divergence in the evolution of this complex. Together, these are important contributions to the understanding of how the CAF-1 complex works.

      Weaknesses:

      1. There are some aspects of the experimentation that are incompletely described: <br /> In the SEC data (Fig. S1C) it appears that Pcf1 in the absence of other proteins forms three major peaks. Two are labeled as "1a" (eluting at ~8 mL) and "1b" (~10-11 mL). It appears that Pcf1 alone or in complex with either or both of the other two subunits forms two different high molecular weight complexes (e.g. 4a/4b, 5a/5b, 6a/6b). There is also a third peak in the analysis of Pcf1 alone, which isn't named here, eluting at ~14 mL, overlapping the peaks labeled 2a, 4c, and 5c. The text describing these different macromolecular complexes seems incomplete (p. 3, lines 32-33): "When isolated, both Pcf2 and Pcf3 are monomeric while Pcf1 forms large soluble oligomers". Which of the three Pcf1-alone peaks are oligomers, and how do we know? What is the third peak? The gel analysis across these chromatograms should be shown.

      We thank the reviewer for his/her careful reading of the manuscript. Indeed, we plotted two curves in Figure S1C in a color that does not match the legend, leading to confusion. Curve 1, Pcf1 alone, depicted in red, should appear in pink as indicated in the legend and in the SDS-PAGE analysis below. Curve 1 exhibits two peaks, labeled as 1a and 1b. With an elution volume of 8.5mL close to the dead volume of the column, peak 1a corresponds to soluble oligomers, while peak 1b (10.4mL) likely corresponds to monomeric Pcf1. Curve 5 (Pcf1 + Pcf2 mixture) was in pink instead of purple as indicated in the legend. This curve consists of three distinct peaks (5a, 5b, and 5c). The SDS-PAGE analysis revealed the presence of oligomers of Pcf1-Pcf2 (5a, 8.3mL), the Pcf1-Pcf2 complex (5b, 9.8mL), and Pcf2 alone (5c, 13.6 mL).

      The color has now been corrected in the revised manuscript.

      More importantly, was a particular SEC peak of the three-subunit CAF-1 complex (i.e. 4a or 4b) characterized in the further experimentation, or were the data obtained from the input material prior to the separation of the different peaks? If the latter, how might this have affected the results? Do the forms inter-convert spontaneously?

      We conducted all structural analyses and DNA/PCNA interactions Figures (1-4, S1-S4) with freshly SECpurified samples corresponding to the 4b peak (9.7mL). Aliquots were flash-frozen with 50% glycerol for in vitro histone assembly assays (Figure 5).

      1. Given the strong structural predication about the roles of residues L359 and F380 (Fig. 2f), these should be mutated to determine effects on histone binding.

      We are pleased that our structural predictions are considered as strong. We agree that investigating the role of the L359 and F380 residues will be critical to further refine the binding interface between histone H3-H4 and CAF-1. An in vitro and in vivo analysis of such mutated forms, alongside the current Pcf1-ED mutant characterized in this article and additional potential mutated forms, has the potential to provide a better understanding of the dynamic of histone deposition by CAF-1. However, these additional approaches would require to reach another step in breaking this enigmatic dynamic.

      1. Could it be that the apparent lack of histone deposition by the delta-WHD mutant complex occurs because this mutant complex is unstable when added to the Xenopus extract?

      We cannot formally exclude this possibility, and this could potentially applies to all mutated forms tested. However, in the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. Nevertheless, we feel reassured by the fact that the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, that reflects a defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe and was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002). This further supports the evolutionary conservation based on genetic assay as a read out for defective histone deposition by CAF-1.

      Reviewer #1 (Recommendations For The Authors):

      • p. 4: "An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS), consistent with a 1:1:1 stoichiometry (Figure S1e). These data are in agreement with a globular complex with a significant flexibility (Figure S1f)." There needs to be more description of the precision of the molecular weight measurement, and what aspects of these data indicate the flexibility.

      The molecular weight was estimated using the correlation volume (Vc) defined by (Rambo & Tainer, Nature 2013, 496, 477-481). The estimated error with this method is around 10%. We added this information together with supporting arguments for the existence of flexibility: “An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS). Assuming an accuracy of around 10% with this method (Rambo and Tainer 2013), this value is consistent with a 1:1:1 stoichiometry for the CAF-1 complex (calculated MW 167kDa) (Figure S1e). In addition, the position of the maximum for the dimensionless Kratky plot was slightly shifted to higher values in the y and x axis compared to the position of the expected maximum of the curve for a fully globular protein (Figure S1f).

      This shows that the complex was globular with a significant flexibility.”

      • p. 6, lines 21-22: "In contrast, a large part of signals (338-396) did not vanish anymore upon addition of a histone complex preformed with two other histone chaperones known to compete with CAF-1 for histone binding..." Given the contrast made later with the 338-351 region which is insensitive to Asf1/Mcm2, it would be clearer for the reader to describe the Asf1/Mcm2-competed regions as residues 325-338 plus 352-396. Note that the numerical scale of residues doesn't line up perfectly with the data points in Figure 2d, and this should be fixed as well.

      We thank this reviewer for spotting this typographical error; we intended to write "In contrast, a large part of signals (348-396) did not vanish anymore… “. We modified paragraph as suggested by the reviewer because we agree it is clearer for the reader : “In contrast, only a shorter fragment (338-347) vanished upon addition of Asf1-H3-H4-Mcm2(69-138), a histone complex preformed with two other histone chaperones, Asf1 and Mcm2, known to compete with CAF-1 for histone binding (Sauer et al. 2017) and whose histone binding modes are well established (Figure 2e) (Huang et al. 2015, Richet et al. 2015). This finding underscores a direct competition between residues (325-338) and (349-396) within the ED domain and Asf1/Mcm2 for histone binding.”

      The slight shift in the numerical scale Figure 2d was also corrected.

      • p. 8. Lines 22-24: "EMSAs with a double-stranded 40bp DNA fragment confirmed the homogeneity of the bound complex. When increasing the SpCAF-1 concentration, additional mobility shifts suggest, a cooperative DNA binding (Figure 3a)." I agree that the migration of the population is further retarded upon the addition of more protein. However, doesn't this negate the first sentence? That is, if multiple CAF-1 complexes can bind each dsDNA molecule, can these complexes be described as homogeneous?

      We fully agree with the reviewer's comment and have removed the notion of homogeneity from the first sentence. “EMSAs with a double-stranded 40bp DNA fragment showed the formation of a bound complex.”

      • Figure S2b Legend: "1H-15N HSQC spectra of Pcf1_ED (425-496)." The residue numbers should read 325-396.

      The typo has been corrected.

      • Is the title for Figure 5 correct?: "Figure 5: Rescue using Y340 and W348 in the ED domain, the intact KER DNA binding domain and the C-terminal WHD of Pcf1 in SpCAF-1 mediated nucleosome assembly." I don't see that any point mutation rescue experiments are done here.

      The title of figure 5 has been modified for “Efficient nucleosome assembly by SpCAF-1 in vitro requires interactions with H3-H4, DNA and PCNA, and the C-terminal WHD domain”.

      • Figure S6C. I assume the top strain lacks the Pcf2-GFP but this should be stated explicitly.

      The following sentence “The top strain corresponds to a strain expressing wild-type and untagged Pcf2 as a negative control of GFP fluorescence” is now added to the figure legend. The figure S6C has been modified accordingly to mention “Pcf2 (untagged)” and state more explicitly.

      • Regarding point #3 in the public review, a simple initial test of this idea would be to determine if similar amounts of wt and mutant complexes can be immunoprecipitated at the endpoint of the assembly reactions.

      In the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. However, the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, reflecting defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe, as it was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002), further supporting the evolution conservation in the genetic assay as a read out for defective histone deposition by CAF-1.

      • Foundational findings that should be cited: The role of PCNA in CAF-1 activity was first recognized by pioneering studies in the Stillman laboratory (PMID: 10052459, 11089978). The earliest recombinant studies of CAF-1 showed that the large subunit is the binding platform for the other two, showed that the KER and ED domains were required for histone deposition activity, and roughly mapped the p60-binding site on the large subunit (PMID: 7600578). Another early study roughly mapped the binding site for the third subunit and showed that biological effects of impairing the PCNA binding synergized with defects in the HIR pathway (PMID: 11756556), a genetic synergy first demonstrated in budding yeast (PMID: 9671489).

      We thank the reviewer for providing these important references that are now cited in the manuscript. PMID: 10052459 and 11089978 are cited page 2 line 18 and 19, PMID: 7600578 page 19 line 5 and PMID: 11756556 and 9671489 page 18 line 2.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe the structure-functional relationship of domains in S. pombe CAF-1, which promotes DNA replication-coupled deposition of histone H3-H4 dimer. The authors nicely showed that the ED domain with an intrinsically disordered structure binds to histone H3-H4, that the KER domain binds to DNA, and that, in addition to a PIP box, the KER domain also contributes to the PCNA binding. The ED and KER domains as well as the WHD domain are essential for nucleosome assembly in vitro. The ED, KER domains, and the PIP box are important for the maintenance of heterochromatin.

      Strengths:

      The combination of structural analysis using NMR and Alphafold2 modeling with biophysical and biochemical analysis provided strong evidence on the role of the different domain structures of the large subunit of SpCAF-1, spPCF-1 in the binding to histone H3-H4, DNA as well as PCNA. The conclusion was further supported by genetic analysis of the various pcf1 mutants. The large amounts of data provided in the paper support the authors' conclusion very well.

      Reviewer #2 (Recommendations For The Authors):

      The paper by Ochesenbein describes the structural and functional analysis of S. pombe CAF-1 complex critical for DNA replication-coupled histone H3/H4 deposition. By using structural, biophysical, and biochemical analyses combined with genetic methods, the authors nicely showed that a large subunit of SpCAF1, SpPCF-1, consists of 5 structured domains with four connecting IDR domains. The ED domain with IDR nature binds to histone H3-H4 dimer with the conformational change of the other domain(s). SpCAF-1 binds to dsDNA by using the KER domain, but not the WHD domain. The experiments have been done with great care and a large amount of the data are highly reliable. Moreover, the results are clearly presented and convincingly written. The conclusion in the paper is very solid and will be useful for researchers who work in the field of chromosome biology.

      Major points:

      1. DNA binding of the KER mutant shown in Figures S3h and S3i, which was measured by the EMSA, looks similar to that of wild-type control in Figure S3f, which is different from the data in Figures 3b and 3e measured by the MST. The authors need a more precise description of the EMSA result of the KER mutant shown in Figures 3 and S3. The quantification of the EMSA result would resolve the point (should be provided).

      A proposed by this reviewer, we performed quantification of all EMSA presented in Figure 3 and Figure S3. We quantified the signal of the free DNA band to calculate a percentage of bound DNA in each condition. All EMSA experiments were conducted in duplicate, allowing us to calculate an average value and standard deviation for each interaction. Representative curves and fitted values are reported below in the figure provided for the reviewer (panel a data for Pcf1_KER domain with two fitting models, panel b for the entire CAF-1 complexes and mutants, panel c for the isolated Pcf1_KER domains), all fitted values in panel d. Importantly, as illustrated in panel a, the complete model for a single interaction (complete KD model, dashed line curve) does not adequately fit the data. In contrast, a function incorporating cooperativity (Hill model) better accounts for the measured data (solid line curve). Consistently, we also used the Hill model to fit the binding curves measured with the MST technique. As also specified now in the text, the Hill model allows to determine an EC50 value (concentration of protein resulting in the disappearance of half of the free DNA band intensity) and a Hill coefficient value (representing cooperativity during the interaction) for each curve.

      We measure a value of 3.4 ± 0.4 μM for the EC50 of SpCAF-1 WT, which is higher than the value measured by MST (0.7 ± 0.1 μM). Higher values were also calculated for all mutants and isolated Pcf1_KER domains compared to MST. These discrepancies could raise from the fact that the DNA concentration used in the two techniques were very different (20nM for MST experiments and 1μM for EMSA). Unlike the complete KD model, which includes in the calculation the DNA concentration (considered here as the "receptor"), the Hill model is fitted independently of this value. This model assumes that the “receptor” concentration is low compared to the KD. Here we calculate EC50 values on the same order of magnitude as the DNA concentration (low micromolar), The quantification obtained by EMSA is thus challenging to interpret. In contrast, values fitted by the MST measurements are more reliable since this limitation of low “receptor” concentration is correct.

      Therefore, although measurements of EC50 and Hill coefficient from EMSA are reproducible, they may be confusing for quantifying apparent affinity values through EC50. Nevertheless, this quantitative analysis of EMSA, requested by the reviewer, has highlighted an interesting characteristic of the KER mutant that is consistent across both methods: even though the EMSA pointed by the reviewer (Figures S3h and S3i compared to the wild-type control in Figure 3d and Figure S3f) show similar EC50 values, the binding cooperativity is different. Binding curves for the KER mutants is no longer cooperative (Hill coefficient ~1), and this is observed for all KER curves (isolated Pcf1_KER domain and the entire SpCAF-1 complex) with both methods, EMSA and MST. We thus decided to emphasize this characteristic of the KER mutant in the text (page 9 line 30-32). “Importantly, this mutant also shows a lower binding cooperativity for DNA binding, as estimated by the Hill coefficient value close to 1, compared to values around 3 for the WT and other mutants.”

      Since EMSA quantifications did not show a loss of “affinity” (as measured by the EC50 value) for the KER* mutants, compared to the WT contrary to MST measurements and because the DNA concentration was close to the measured EC50, we consider that EC50 values calculated by EMSA do not represent a KD value. If we add this quantification, we should discuss this point in detail. Thus, for sake of clarity, we prefer to put in the manuscript EMSA measurements as illustrations and qualitative validations of the interaction but not to include the quantification.

      Author response image 1.

      Quantitative analysis of interaction with DNA by EMSA. a: quantification of the amount of bound DNA for the Pcf1_KER domain (blue points with error bars). The fit with a KD model is shown as a dashed line, and the fit with a Hill model with a solid line. b: Examples of quantifications and fits (Hill model) for reconstituted SpCAF-1 WT and mutants. c: Examples of quantifications and fits (Hill model) for Pcf1_KER domains WT and mutant. d: EC50 values and Hill coefficients obtained for all EMSA experiments presented in Figure 3 and S3.

      1. As with the cooperative DNA binding of CAF-1, it is very important to show the stoichiometry of CAF-1 to the DNA or the site size. Given a long alpha-helix of the KER domain with biased charges, it is also interesting to show a model of how the dsDNA binds to the long helix with a cooperative binding property (this is not essential but would be helpful if the authors discuss it).

      We agree that having a molecular model for the binding of the KER helix to DNA would be especially interesting, but at this point, considering the accuracy of the tools currently at our disposal for predicting DNA-protein interactions, such a model would remain highly speculative.

      1. Figure 5 shows nucleosome assembly by SpCAF-1. SpCAF-1-PIP* mutant produced a product with faster mobility than the control at 2 h incubation. How much amounts of SpCAF-1 was added in the reaction seems to be critical. At least a few different concentrations of proteins should be tested.

      The slightly faster migration of the SpCAF-1-PIPis not systematically reproduced and we observed in several experiments that the band corresponding to supercoiled DNA migrated slightly above or below the one for the complementation by the SpCAF-1-WT (see Author response image 2 below). Thus this indicates that after 2 hours incubation the supercoiling assay with the SpCAF-1-PIP mutant compared to those achieved with the SpCAF-1-WT. To further document whether the WT or the PIP mutant are similar or not, we monitored difference of their nucleosome assembly efficiency by testing their ability to produce supercoiled DNA over shorter time, after 45 minute incubation. Under these conditions, we reproducibly detected supercoiled forms at earlier times with SpCAF-1-WT when compared to the SpCAF-1-PIP* (see figure 5 and Author response image 2). These observations indicate that mutation in the PIP motif of Pcf1 affects the rate of supercoiling in a distinct manner when compared to the other mutations that dramatically impair SpCAF-1 capacity to promote supercoiling.

      Author response image 2.

      Minor points:

      1. Page 8, line 26 or Table 1 legend: Please explain what "EC50" is.

      The definition of EC50, together with a reference paper for the Hill model have been added in the text page 8 lines 23-26, “The curves were fitted with a Hill model (Tso et al. 2018) with a EC50 value of 0.7± 0.1µM (effective concentration at which a 50% signal is observed) and a cooperativity (Hill coefficient, h) of 2.7 ± 0.2, in line with a cooperative DNA binging of SpCAF-1.”, in the Table 1 figure legend and in the method section (page 26).

      1. Page 13, lines 9, 11: "Xenopus" should be italicized.

      This is corrected

      1. Page 14, second half: In S. pombe, the pcf1 deletion mutant is not lethal. It is helpful to mention the phenotype of the deletion mutant a bit more when the authors described the genetic analysis of various pcf1 mutants.

      This point has been added on page 15, line 1.

      1. Figure 1d and Figure S2a: Captions and labels on the X and Y axes are overlapped or misplaced.

      This is corrected

      1. Figure 5: Please add a schematic figure of the assay to explain how one can check the nucleosome assembly by looking at the form I, supercoiled DNAs.

      A new panel has been added to Figure 5. This scheme depicts the supercoiling assay where supercoiled DNA (form I) is used as an indication of efficient nucleosome assembly. The figure legend has also been modified accordingly.

      Reviewer #3 (Public Review):

      Summary:

      The study conducted by Ouasti et al. is an elegant investigation of fission yeast CAF-1, employing a diverse array of technologies to dissect its functions and their interdependence. These functions play a critical role in specifying interactions vital for DNA replication, heterochromatin maintenance, and DNA damage repair, and their dynamics involve multiple interactions. The authors have extensively utilized various in vitro and in vivo tools to validate their model and emphasize the dynamic nature of this complex.

      Strengths:

      Their work is supported by robust experimental data from multiple techniques, including NMR and SAXS, which validate their molecular model. They conducted in vitro interactions using EMSA and isothermal microcalorimetry, in vitro histone deposition using Xenopus high-speed egg extract, and systematically generated and tested various genetic mutants for functionality in in vivo assays. They successfully delineated domain-specific functions using in vitro assays and could validate their roles to large extent using genetic mutants. One significant revelation from this study is the unfolded nature of the acidic domain, observed to fold when binding to histones. Additionally, the authors also elucidated the role of the long KER helix in mediating DNA binding and enhancing the association of CAF-1 with PCNA. The paper effectively addresses its primary objective and is strong.

      Weaknesses:

      A few relatively minor unresolved aspects persist, which, if clarified or experimentally addressed by the authors, could further bolster the study.

      1. The precise function of the WHD domain remains elusive. Its deletion does not result in DNA damage accumulation or defects in heterochromatin maintenance. This raises questions about the biological significance of this domain and whether it is dispensable. While in vitro assays revealed defects in chromatin assembly using this mutant (Figure 5), confirming these phenotypes through in vivo assays would provide additional assurance that the lack of function is not simply due to the in vitro system lacking PTMs or other regulatory factors.

      Our work demonstrates that the WHD domain is important CAF-1 function during DNA replication. Indeed, the deletion of this domain lead to a synthetic lethality when combined with mutation of the HIRA complex, as observed for a null pcf1 mutant, indicating a severe loss of function in the absence of the WHD domain. We propose that these genetic interactions, previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002) are indicative of a defective histone deposition by CAF-1. Moreover, our work establishes that this domain is dispensable to prevent DNA damage accumulation and to maintain silencing at centromeric heterochromatin, indicating that the WHD domain specifies CAF-1 functions. Moreover, our work further demonstrates that, in contrast to the S. cerevisiae and human WHD domain, the S. pombe counterpart exhibits no DNA binding activity. We thus agree that the WHD domain may contribute to nucleosome assembly in vivo via PTMs or interactions with regulatory factors that may potentially lack in in vitro systems. However, addressing these aspects deserves further investigations beyond the scope of this article.

      1. The observation of increased Pcf2-gfp foci in pcf1-ED cells, particularly in mono-nucleated (G2phase) and bi-nucleated cells with septum marks (S-phase), might suggest the presence of replication stress. This could imply incomplete replication in specific regions, leading to the persistence of Caf1-ED-PCNA factories throughout the cell cycle. To further confirm this, detecting accumulated single-stranded DNA (ssDNA) regions outside of S-phase using RPA as an ssDNA marker could be informative.

      We cannot formally exclude that cells expressing the Pcf1-ED mutated form exhibit incomplete replication in specific regions, an aspect that would require careful investigations. However, the microscopy analysis (Fig. 6c and S6c) of this mutant showed no alteration in the cell morphology, including the absence of elongated cells compared to wild type, a hallmark of checkpoint activation caused by ssDNA (Enoch et al. Gene & Dev 1992). Therefore, investigating the consequences of the interplay between the binding of CAF-1 to PCNA and histones on the dynamic of DNA replication, is of particular interest but out of the scope of the current manuscript.

      1. Moreover, considering the authors' strong assertion of histone binding defects in ED through in vitro assays (Figure 2d and S2a), these claims could be further substantiated, especially considering that some degree of histone deposition might still persist in vivo in the ED mutant (Figure 7d, viable though growth defective double ED*+hip1D mutants). For example, the approach, akin to the one employed in Fig. 6a (FLAG-IPs of various Pcf1-FLAG-tagged mutants), could also enable a comparison of the association of different mutants with histones and PCNA, providing a more thorough validation of their findings.

      We have provided in the current manuscript data establishing how Pcf1 mutated forms interacted with PCNA (Fig. 6a, 6b). Regarding the interactions with histone H3-H4, the approach based on immunoprecipitation using various Pcf1-FLAG tagged mutants has been unsuccessful in our hands. Indeed, we were unable to obtain robust and reproducible interactions between Pcf1 or its various mutated form with H3-H4. This is likely because Co-IP approaches do not probe for direct interactions. Indirect interactions between Pcf1 and H3-H4 are potentially bridged by additional factors, including the two other subunits of CAF-1, Pcf2 and Pcf3, or Asf1. Therefore, we are not in a position to address in vivo the direct interactions between Pcf1 and histone H3-H4.

      1. It would be valuable for the authors to speculate on the necessity of having disordered regions in CAF1. Specifically, exploring the overall distribution of these domains within disordered/unfolded structures could provide insightful perspectives. Additionally, it's intriguing to note that the significant disparities observed among mutants (ED, PIP, and KER*) in in vitro assays seem to become more generic in vivo, except for the indispensability of the WHD-domain. Could these disordered regions potentially play a crucial role in the phase separation of replication factories? Considering these questions could offer valuable insights into the underlying mechanisms at play.

      We agree that the potential mechanistic role of partial disorder in CAF-1 is particularly interesting. Disordered regions of human CAF-1 have been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al EMBO J. 2021). As suggested, this raises the question of how disordered domains of Pcf1 could promote phase separation for replication factories, if such phenomenon happens in vivo. Moreover, numerous factors of the replisome also harbor disordered regions (Bedina, A. et al, 2013. Intrinsically Disordered Proteins in Replication Process. InTech. doi: 10.5772/51673), adding complexity in disentangling experimentally such questions. We have added these elements at the end of the discussion in the revised manuscript (page 20, lines 23-29). “Such plasticity and cross-talks provided by structurally disordered domains might be key for the multivalent CAF-1 functions. Human CAF-1 has been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al. 2021). This raises the question of a potential role of the disordered domains of Pcf1, together with other replisome factor harbouring such disordered regions (Bedina 2013), in promoting phase separation of replication factories, if such phenomenon happens in vivo. Further studies will be needed to tackle these questions.”

    2. Author Response

      The following is the authors’ response to the original reviews.

      We thank the three reviewers and the reviewing editor for their positive evaluation of our manuscript. We particularly appreciate that they unanimously consider our work as “important contributions to the understanding of how the CAF-1 complex works”, “The large amounts of data provided in the paper support the authors' conclusion very well” and “The paper effectively addresses its primary objective and is strong”. We also thank them for a careful reading and useful comments to improve the manuscript. We have built on these comments to provide an improved version of the manuscript, and address them point by point below .

      Reviewer #1 (Public Review):

      Summary:

      This paper makes important contributions to the structural analysis of the DNA replication-linked nucleosome assembly machine termed Chromatin Assembly Factor-1 (CAF-1). The authors focus on the interplay of domains that bind DNA, histones, and replication clamp protein PCNA.

      Strengths:

      The authors analyze soluble complexes containing full-length versions of all three fission yeast CAF-1 subunits, an important accomplishment given that many previous structural and biophysical studies have focused on truncated complexes. New data here supports previous experiments indicating that the KER domain is a long alpha helix that binds DNA. Via NMR, the authors discover structural changes at the histone binding site, defined here with high resolution. Most strikingly, the experiments here show that for the S. pombe CAF-1 complex, the WHD domain at the C-terminus of the large subunit lacks DNA binding activity observed in the human and budding yeast homologs, indicating a surprising divergence in the evolution of this complex. Together, these are important contributions to the understanding of how the CAF-1 complex works.

      Weaknesses:

      1. There are some aspects of the experimentation that are incompletely described: <br /> In the SEC data (Fig. S1C) it appears that Pcf1 in the absence of other proteins forms three major peaks. Two are labeled as "1a" (eluting at ~8 mL) and "1b" (~10-11 mL). It appears that Pcf1 alone or in complex with either or both of the other two subunits forms two different high molecular weight complexes (e.g. 4a/4b, 5a/5b, 6a/6b). There is also a third peak in the analysis of Pcf1 alone, which isn't named here, eluting at ~14 mL, overlapping the peaks labeled 2a, 4c, and 5c. The text describing these different macromolecular complexes seems incomplete (p. 3, lines 32-33): "When isolated, both Pcf2 and Pcf3 are monomeric while Pcf1 forms large soluble oligomers". Which of the three Pcf1-alone peaks are oligomers, and how do we know? What is the third peak? The gel analysis across these chromatograms should be shown.

      We thank the reviewer for his/her careful reading of the manuscript. Indeed, we plotted two curves in Figure S1C in a color that does not match the legend, leading to confusion. Curve 1, Pcf1 alone, depicted in red, should appear in pink as indicated in the legend and in the SDS-PAGE analysis below. Curve 1 exhibits two peaks, labeled as 1a and 1b. With an elution volume of 8.5mL close to the dead volume of the column, peak 1a corresponds to soluble oligomers, while peak 1b (10.4mL) likely corresponds to monomeric Pcf1. Curve 5 (Pcf1 + Pcf2 mixture) was in pink instead of purple as indicated in the legend. This curve consists of three distinct peaks (5a, 5b, and 5c). The SDS-PAGE analysis revealed the presence of oligomers of Pcf1-Pcf2 (5a, 8.3mL), the Pcf1-Pcf2 complex (5b, 9.8mL), and Pcf2 alone (5c, 13.6 mL).

      The color has now been corrected in the revised manuscript.

      More importantly, was a particular SEC peak of the three-subunit CAF-1 complex (i.e. 4a or 4b) characterized in the further experimentation, or were the data obtained from the input material prior to the separation of the different peaks? If the latter, how might this have affected the results? Do the forms inter-convert spontaneously?

      We conducted all structural analyses and DNA/PCNA interactions Figures (1-4, S1-S4) with freshly SECpurified samples corresponding to the 4b peak (9.7mL). Aliquots were flash-frozen with 50% glycerol for in vitro histone assembly assays (Figure 5).

      1. Given the strong structural predication about the roles of residues L359 and F380 (Fig. 2f), these should be mutated to determine effects on histone binding.

      We are pleased that our structural predictions are considered as strong. We agree that investigating the role of the L359 and F380 residues will be critical to further refine the binding interface between histone H3-H4 and CAF-1. An in vitro and in vivo analysis of such mutated forms, alongside the current Pcf1-ED mutant characterized in this article and additional potential mutated forms, has the potential to provide a better understanding of the dynamic of histone deposition by CAF-1. However, these additional approaches would require to reach another step in breaking this enigmatic dynamic.

      1. Could it be that the apparent lack of histone deposition by the delta-WHD mutant complex occurs because this mutant complex is unstable when added to the Xenopus extract?

      We cannot formally exclude this possibility, and this could potentially applies to all mutated forms tested. However, in the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. Nevertheless, we feel reassured by the fact that the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, that reflects a defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe and was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002). This further supports the evolutionary conservation based on genetic assay as a read out for defective histone deposition by CAF-1.

      Reviewer #1 (Recommendations For The Authors):

      • p. 4: "An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS), consistent with a 1:1:1 stoichiometry (Figure S1e). These data are in agreement with a globular complex with a significant flexibility (Figure S1f)." There needs to be more description of the precision of the molecular weight measurement, and what aspects of these data indicate the flexibility.

      The molecular weight was estimated using the correlation volume (Vc) defined by (Rambo & Tainer, Nature 2013, 496, 477-481). The estimated error with this method is around 10%. We added this information together with supporting arguments for the existence of flexibility: “An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS). Assuming an accuracy of around 10% with this method (Rambo and Tainer 2013), this value is consistent with a 1:1:1 stoichiometry for the CAF-1 complex (calculated MW 167kDa) (Figure S1e). In addition, the position of the maximum for the dimensionless Kratky plot was slightly shifted to higher values in the y and x axis compared to the position of the expected maximum of the curve for a fully globular protein (Figure S1f).

      This shows that the complex was globular with a significant flexibility.”

      • p. 6, lines 21-22: "In contrast, a large part of signals (338-396) did not vanish anymore upon addition of a histone complex preformed with two other histone chaperones known to compete with CAF-1 for histone binding..." Given the contrast made later with the 338-351 region which is insensitive to Asf1/Mcm2, it would be clearer for the reader to describe the Asf1/Mcm2-competed regions as residues 325-338 plus 352-396. Note that the numerical scale of residues doesn't line up perfectly with the data points in Figure 2d, and this should be fixed as well.

      We thank this reviewer for spotting this typographical error; we intended to write "In contrast, a large part of signals (348-396) did not vanish anymore… “. We modified paragraph as suggested by the reviewer because we agree it is clearer for the reader : “In contrast, only a shorter fragment (338-347) vanished upon addition of Asf1-H3-H4-Mcm2(69-138), a histone complex preformed with two other histone chaperones, Asf1 and Mcm2, known to compete with CAF-1 for histone binding (Sauer et al. 2017) and whose histone binding modes are well established (Figure 2e) (Huang et al. 2015, Richet et al. 2015). This finding underscores a direct competition between residues (325-338) and (349-396) within the ED domain and Asf1/Mcm2 for histone binding.”

      The slight shift in the numerical scale Figure 2d was also corrected.

      • p. 8. Lines 22-24: "EMSAs with a double-stranded 40bp DNA fragment confirmed the homogeneity of the bound complex. When increasing the SpCAF-1 concentration, additional mobility shifts suggest, a cooperative DNA binding (Figure 3a)." I agree that the migration of the population is further retarded upon the addition of more protein. However, doesn't this negate the first sentence? That is, if multiple CAF-1 complexes can bind each dsDNA molecule, can these complexes be described as homogeneous?

      We fully agree with the reviewer's comment and have removed the notion of homogeneity from the first sentence. “EMSAs with a double-stranded 40bp DNA fragment showed the formation of a bound complex.”

      • Figure S2b Legend: "1H-15N HSQC spectra of Pcf1_ED (425-496)." The residue numbers should read 325-396.

      The typo has been corrected.

      • Is the title for Figure 5 correct?: "Figure 5: Rescue using Y340 and W348 in the ED domain, the intact KER DNA binding domain and the C-terminal WHD of Pcf1 in SpCAF-1 mediated nucleosome assembly." I don't see that any point mutation rescue experiments are done here.

      The title of figure 5 has been modified for “Efficient nucleosome assembly by SpCAF-1 in vitro requires interactions with H3-H4, DNA and PCNA, and the C-terminal WHD domain”.

      • Figure S6C. I assume the top strain lacks the Pcf2-GFP but this should be stated explicitly.

      The following sentence “The top strain corresponds to a strain expressing wild-type and untagged Pcf2 as a negative control of GFP fluorescence” is now added to the figure legend. The figure S6C has been modified accordingly to mention “Pcf2 (untagged)” and state more explicitly.

      • Regarding point #3 in the public review, a simple initial test of this idea would be to determine if similar amounts of wt and mutant complexes can be immunoprecipitated at the endpoint of the assembly reactions.

      In the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. However, the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, reflecting defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe, as it was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002), further supporting the evolution conservation in the genetic assay as a read out for defective histone deposition by CAF-1.

      • Foundational findings that should be cited: The role of PCNA in CAF-1 activity was first recognized by pioneering studies in the Stillman laboratory (PMID: 10052459, 11089978). The earliest recombinant studies of CAF-1 showed that the large subunit is the binding platform for the other two, showed that the KER and ED domains were required for histone deposition activity, and roughly mapped the p60-binding site on the large subunit (PMID: 7600578). Another early study roughly mapped the binding site for the third subunit and showed that biological effects of impairing the PCNA binding synergized with defects in the HIR pathway (PMID: 11756556), a genetic synergy first demonstrated in budding yeast (PMID: 9671489).

      We thank the reviewer for providing these important references that are now cited in the manuscript. PMID: 10052459 and 11089978 are cited page 2 line 18 and 19, PMID: 7600578 page 19 line 5 and PMID: 11756556 and 9671489 page 18 line 2.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe the structure-functional relationship of domains in S. pombe CAF-1, which promotes DNA replication-coupled deposition of histone H3-H4 dimer. The authors nicely showed that the ED domain with an intrinsically disordered structure binds to histone H3-H4, that the KER domain binds to DNA, and that, in addition to a PIP box, the KER domain also contributes to the PCNA binding. The ED and KER domains as well as the WHD domain are essential for nucleosome assembly in vitro. The ED, KER domains, and the PIP box are important for the maintenance of heterochromatin.

      Strengths:

      The combination of structural analysis using NMR and Alphafold2 modeling with biophysical and biochemical analysis provided strong evidence on the role of the different domain structures of the large subunit of SpCAF-1, spPCF-1 in the binding to histone H3-H4, DNA as well as PCNA. The conclusion was further supported by genetic analysis of the various pcf1 mutants. The large amounts of data provided in the paper support the authors' conclusion very well.

      Reviewer #2 (Recommendations For The Authors):

      The paper by Ochesenbein describes the structural and functional analysis of S. pombe CAF-1 complex critical for DNA replication-coupled histone H3/H4 deposition. By using structural, biophysical, and biochemical analyses combined with genetic methods, the authors nicely showed that a large subunit of SpCAF1, SpPCF-1, consists of 5 structured domains with four connecting IDR domains. The ED domain with IDR nature binds to histone H3-H4 dimer with the conformational change of the other domain(s). SpCAF-1 binds to dsDNA by using the KER domain, but not the WHD domain. The experiments have been done with great care and a large amount of the data are highly reliable. Moreover, the results are clearly presented and convincingly written. The conclusion in the paper is very solid and will be useful for researchers who work in the field of chromosome biology.

      Major points:

      1. DNA binding of the KER mutant shown in Figures S3h and S3i, which was measured by the EMSA, looks similar to that of wild-type control in Figure S3f, which is different from the data in Figures 3b and 3e measured by the MST. The authors need a more precise description of the EMSA result of the KER mutant shown in Figures 3 and S3. The quantification of the EMSA result would resolve the point (should be provided).

      A proposed by this reviewer, we performed quantification of all EMSA presented in Figure 3 and Figure S3. We quantified the signal of the free DNA band to calculate a percentage of bound DNA in each condition. All EMSA experiments were conducted in duplicate, allowing us to calculate an average value and standard deviation for each interaction. Representative curves and fitted values are reported below in the figure provided for the reviewer (panel a data for Pcf1_KER domain with two fitting models, panel b for the entire CAF-1 complexes and mutants, panel c for the isolated Pcf1_KER domains), all fitted values in panel d. Importantly, as illustrated in panel a, the complete model for a single interaction (complete KD model, dashed line curve) does not adequately fit the data. In contrast, a function incorporating cooperativity (Hill model) better accounts for the measured data (solid line curve). Consistently, we also used the Hill model to fit the binding curves measured with the MST technique. As also specified now in the text, the Hill model allows to determine an EC50 value (concentration of protein resulting in the disappearance of half of the free DNA band intensity) and a Hill coefficient value (representing cooperativity during the interaction) for each curve.

      We measure a value of 3.4 ± 0.4 μM for the EC50 of SpCAF-1 WT, which is higher than the value measured by MST (0.7 ± 0.1 μM). Higher values were also calculated for all mutants and isolated Pcf1_KER domains compared to MST. These discrepancies could raise from the fact that the DNA concentration used in the two techniques were very different (20nM for MST experiments and 1μM for EMSA). Unlike the complete KD model, which includes in the calculation the DNA concentration (considered here as the "receptor"), the Hill model is fitted independently of this value. This model assumes that the “receptor” concentration is low compared to the KD. Here we calculate EC50 values on the same order of magnitude as the DNA concentration (low micromolar), The quantification obtained by EMSA is thus challenging to interpret. In contrast, values fitted by the MST measurements are more reliable since this limitation of low “receptor” concentration is correct.

      Therefore, although measurements of EC50 and Hill coefficient from EMSA are reproducible, they may be confusing for quantifying apparent affinity values through EC50. Nevertheless, this quantitative analysis of EMSA, requested by the reviewer, has highlighted an interesting characteristic of the KER mutant that is consistent across both methods: even though the EMSA pointed by the reviewer (Figures S3h and S3i compared to the wild-type control in Figure 3d and Figure S3f) show similar EC50 values, the binding cooperativity is different. Binding curves for the KER mutants is no longer cooperative (Hill coefficient ~1), and this is observed for all KER curves (isolated Pcf1_KER domain and the entire SpCAF-1 complex) with both methods, EMSA and MST. We thus decided to emphasize this characteristic of the KER mutant in the text (page 9 line 30-32). “Importantly, this mutant also shows a lower binding cooperativity for DNA binding, as estimated by the Hill coefficient value close to 1, compared to values around 3 for the WT and other mutants.”

      Since EMSA quantifications did not show a loss of “affinity” (as measured by the EC50 value) for the KER* mutants, compared to the WT contrary to MST measurements and because the DNA concentration was close to the measured EC50, we consider that EC50 values calculated by EMSA do not represent a KD value. If we add this quantification, we should discuss this point in detail. Thus, for sake of clarity, we prefer to put in the manuscript EMSA measurements as illustrations and qualitative validations of the interaction but not to include the quantification.

      Author response image 1.

      Quantitative analysis of interaction with DNA by EMSA. a: quantification of the amount of bound DNA for the Pcf1_KER domain (blue points with error bars). The fit with a KD model is shown as a dashed line, and the fit with a Hill model with a solid line. b: Examples of quantifications and fits (Hill model) for reconstituted SpCAF-1 WT and mutants. c: Examples of quantifications and fits (Hill model) for Pcf1_KER domains WT and mutant. d: EC50 values and Hill coefficients obtained for all EMSA experiments presented in Figure 3 and S3.

      1. As with the cooperative DNA binding of CAF-1, it is very important to show the stoichiometry of CAF-1 to the DNA or the site size. Given a long alpha-helix of the KER domain with biased charges, it is also interesting to show a model of how the dsDNA binds to the long helix with a cooperative binding property (this is not essential but would be helpful if the authors discuss it).

      We agree that having a molecular model for the binding of the KER helix to DNA would be especially interesting, but at this point, considering the accuracy of the tools currently at our disposal for predicting DNA-protein interactions, such a model would remain highly speculative.

      1. Figure 5 shows nucleosome assembly by SpCAF-1. SpCAF-1-PIP* mutant produced a product with faster mobility than the control at 2 h incubation. How much amounts of SpCAF-1 was added in the reaction seems to be critical. At least a few different concentrations of proteins should be tested.

      The slightly faster migration of the SpCAF-1-PIPis not systematically reproduced and we observed in several experiments that the band corresponding to supercoiled DNA migrated slightly above or below the one for the complementation by the SpCAF-1-WT (see Author response image 2 below). Thus this indicates that after 2 hours incubation the supercoiling assay with the SpCAF-1-PIP mutant compared to those achieved with the SpCAF-1-WT. To further document whether the WT or the PIP mutant are similar or not, we monitored difference of their nucleosome assembly efficiency by testing their ability to produce supercoiled DNA over shorter time, after 45 minute incubation. Under these conditions, we reproducibly detected supercoiled forms at earlier times with SpCAF-1-WT when compared to the SpCAF-1-PIP* (see figure 5 and Author response image 2). These observations indicate that mutation in the PIP motif of Pcf1 affects the rate of supercoiling in a distinct manner when compared to the other mutations that dramatically impair SpCAF-1 capacity to promote supercoiling.

      Author response image 2.

      Minor points:

      1. Page 8, line 26 or Table 1 legend: Please explain what "EC50" is.

      The definition of EC50, together with a reference paper for the Hill model have been added in the text page 8 lines 23-26, “The curves were fitted with a Hill model (Tso et al. 2018) with a EC50 value of 0.7± 0.1µM (effective concentration at which a 50% signal is observed) and a cooperativity (Hill coefficient, h) of 2.7 ± 0.2, in line with a cooperative DNA binging of SpCAF-1.”, in the Table 1 figure legend and in the method section (page 26).

      1. Page 13, lines 9, 11: "Xenopus" should be italicized.

      This is corrected

      1. Page 14, second half: In S. pombe, the pcf1 deletion mutant is not lethal. It is helpful to mention the phenotype of the deletion mutant a bit more when the authors described the genetic analysis of various pcf1 mutants.

      This point has been added on page 15, line 1.

      1. Figure 1d and Figure S2a: Captions and labels on the X and Y axes are overlapped or misplaced.

      This is corrected

      1. Figure 5: Please add a schematic figure of the assay to explain how one can check the nucleosome assembly by looking at the form I, supercoiled DNAs.

      A new panel has been added to Figure 5. This scheme depicts the supercoiling assay where supercoiled DNA (form I) is used as an indication of efficient nucleosome assembly. The figure legend has also been modified accordingly.

      Reviewer #3 (Public Review):

      Summary:

      The study conducted by Ouasti et al. is an elegant investigation of fission yeast CAF-1, employing a diverse array of technologies to dissect its functions and their interdependence. These functions play a critical role in specifying interactions vital for DNA replication, heterochromatin maintenance, and DNA damage repair, and their dynamics involve multiple interactions. The authors have extensively utilized various in vitro and in vivo tools to validate their model and emphasize the dynamic nature of this complex.

      Strengths:

      Their work is supported by robust experimental data from multiple techniques, including NMR and SAXS, which validate their molecular model. They conducted in vitro interactions using EMSA and isothermal microcalorimetry, in vitro histone deposition using Xenopus high-speed egg extract, and systematically generated and tested various genetic mutants for functionality in in vivo assays. They successfully delineated domain-specific functions using in vitro assays and could validate their roles to large extent using genetic mutants. One significant revelation from this study is the unfolded nature of the acidic domain, observed to fold when binding to histones. Additionally, the authors also elucidated the role of the long KER helix in mediating DNA binding and enhancing the association of CAF-1 with PCNA. The paper effectively addresses its primary objective and is strong.

      Weaknesses:

      A few relatively minor unresolved aspects persist, which, if clarified or experimentally addressed by the authors, could further bolster the study.

      1. The precise function of the WHD domain remains elusive. Its deletion does not result in DNA damage accumulation or defects in heterochromatin maintenance. This raises questions about the biological significance of this domain and whether it is dispensable. While in vitro assays revealed defects in chromatin assembly using this mutant (Figure 5), confirming these phenotypes through in vivo assays would provide additional assurance that the lack of function is not simply due to the in vitro system lacking PTMs or other regulatory factors.

      Our work demonstrates that the WHD domain is important CAF-1 function during DNA replication. Indeed, the deletion of this domain lead to a synthetic lethality when combined with mutation of the HIRA complex, as observed for a null pcf1 mutant, indicating a severe loss of function in the absence of the WHD domain. We propose that these genetic interactions, previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002) are indicative of a defective histone deposition by CAF-1. Moreover, our work establishes that this domain is dispensable to prevent DNA damage accumulation and to maintain silencing at centromeric heterochromatin, indicating that the WHD domain specifies CAF-1 functions. Moreover, our work further demonstrates that, in contrast to the S. cerevisiae and human WHD domain, the S. pombe counterpart exhibits no DNA binding activity. We thus agree that the WHD domain may contribute to nucleosome assembly in vivo via PTMs or interactions with regulatory factors that may potentially lack in in vitro systems. However, addressing these aspects deserves further investigations beyond the scope of this article.

      1. The observation of increased Pcf2-gfp foci in pcf1-ED cells, particularly in mono-nucleated (G2phase) and bi-nucleated cells with septum marks (S-phase), might suggest the presence of replication stress. This could imply incomplete replication in specific regions, leading to the persistence of Caf1-ED-PCNA factories throughout the cell cycle. To further confirm this, detecting accumulated single-stranded DNA (ssDNA) regions outside of S-phase using RPA as an ssDNA marker could be informative.

      We cannot formally exclude that cells expressing the Pcf1-ED mutated form exhibit incomplete replication in specific regions, an aspect that would require careful investigations. However, the microscopy analysis (Fig. 6c and S6c) of this mutant showed no alteration in the cell morphology, including the absence of elongated cells compared to wild type, a hallmark of checkpoint activation caused by ssDNA (Enoch et al. Gene & Dev 1992). Therefore, investigating the consequences of the interplay between the binding of CAF-1 to PCNA and histones on the dynamic of DNA replication, is of particular interest but out of the scope of the current manuscript.

      1. Moreover, considering the authors' strong assertion of histone binding defects in ED through in vitro assays (Figure 2d and S2a), these claims could be further substantiated, especially considering that some degree of histone deposition might still persist in vivo in the ED mutant (Figure 7d, viable though growth defective double ED*+hip1D mutants). For example, the approach, akin to the one employed in Fig. 6a (FLAG-IPs of various Pcf1-FLAG-tagged mutants), could also enable a comparison of the association of different mutants with histones and PCNA, providing a more thorough validation of their findings.

      We have provided in the current manuscript data establishing how Pcf1 mutated forms interacted with PCNA (Fig. 6a, 6b). Regarding the interactions with histone H3-H4, the approach based on immunoprecipitation using various Pcf1-FLAG tagged mutants has been unsuccessful in our hands. Indeed, we were unable to obtain robust and reproducible interactions between Pcf1 or its various mutated form with H3-H4. This is likely because Co-IP approaches do not probe for direct interactions. Indirect interactions between Pcf1 and H3-H4 are potentially bridged by additional factors, including the two other subunits of CAF-1, Pcf2 and Pcf3, or Asf1. Therefore, we are not in a position to address in vivo the direct interactions between Pcf1 and histone H3-H4.

      1. It would be valuable for the authors to speculate on the necessity of having disordered regions in CAF1. Specifically, exploring the overall distribution of these domains within disordered/unfolded structures could provide insightful perspectives. Additionally, it's intriguing to note that the significant disparities observed among mutants (ED, PIP, and KER*) in in vitro assays seem to become more generic in vivo, except for the indispensability of the WHD-domain. Could these disordered regions potentially play a crucial role in the phase separation of replication factories? Considering these questions could offer valuable insights into the underlying mechanisms at play.

      We agree that the potential mechanistic role of partial disorder in CAF-1 is particularly interesting. Disordered regions of human CAF-1 have been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al EMBO J. 2021). As suggested, this raises the question of how disordered domains of Pcf1 could promote phase separation for replication factories, if such phenomenon happens in vivo. Moreover, numerous factors of the replisome also harbor disordered regions (Bedina, A. et al, 2013. Intrinsically Disordered Proteins in Replication Process. InTech. doi: 10.5772/51673), adding complexity in disentangling experimentally such questions. We have added these elements at the end of the discussion in the revised manuscript (page 20, lines 23-29). “Such plasticity and cross-talks provided by structurally disordered domains might be key for the multivalent CAF-1 functions. Human CAF-1 has been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al. 2021). This raises the question of a potential role of the disordered domains of Pcf1, together with other replisome factor harbouring such disordered regions (Bedina 2013), in promoting phase separation of replication factories, if such phenomenon happens in vivo. Further studies will be needed to tackle these questions.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      Ngoune et al. present compelling evidence that Slender cells are challenged to infect tsetse flies. They explore the experimental context of a recent important paper in the field, Schuster et al., that presents evidence suggesting the proliferative Slender bloodstream T. brucei can infect juvenile tsetse flies. Schuster et al. were disruptive to the widely accepted paradigm that the Stumpy bloodstream-form is solely responsible for tsetse infection and T. brucei transmission potential. Evidence presented here shows that in all cases, Stumpy form parasites are exponentially more capable of infecting tsetse flies. They further show that Slender cells do not infect mature flies.

      However, they raise questions of immature tsetse immunological potential and field transmission potential that their experiments do not address. Specifically, they do not show that teneral tsetse flies are immunocompromised, that tsetse flies must be immunocompromised for Slender infection nor that younger teneral tsetse infection is not pertinent to field transmission.

      Strengths:

      Experimental Design is precise and elegant, outcomes are convincing. Discussion is compelling and important to the field. This is a timely piece that adds important data to a critical discussion of host: parasite interactions, of relevance to all parasite transmission.

      Thank you

      Weaknesses:

      As above, the authors dispute the biological relevance of teneral tsetse infection in the wild, without offering evidence to the contrary. Statements need to be softened for claims regarding immunological competence or relevance to field transmission.

      We have modified the revised version to soften these claims (l.156 and l.159). Please, note that the limited immunocompetence of teneral flies has been extensively studied by the labs of S. Aksoy at Yale and M. Lehane at Liverpool. In the discussion, we provide key references from these two labs 18-21. Our comment on the relevance to field transmission is simply based on field observations of the fly biology.

      Reviewer #2:

      Summary:

      Contrary to findings recently reported by Schuster S et al., this short paper shows evidence that the stumpy form of T. brucei is probably the most pre-adapted form to progress with the life cycle of this parasite in the tsetse vector.

      Strengths:

      One of the most important pieces of experimental evidence is that they conduct all fly infection experiments in the absence of metabolites like GlcNAc or S-glutathione; by doing so, the infection rates in flies infected with slender trypanosomes seem very low or non-existent. This, on its own, is a piece of important experimental evidence that the Schuster S et al findings may need to be revisited.

      Thank you

      Weaknesses:

      I consider that the authors should have included their own experiments demonstrating that the addition of these chemicals enhances the infection rates in flies receiving bloodmeals containing slender trypanosomes.

      The main purpose of this study is to assess the intrinsic infectivity of SL Vs. ST in teneral Vs. adult flies, not to reproduce the results obtained by Schuster et al.. We think that the suggested experiment is not necessary as L-Glutathion is well-known to enhance infection rates by reducing the fly immune response efficiency (Ref 24). Most of the experimental infections with procyclic or ST forms (even at low densities) published by our lab and others, especially for studying parasite stages in the salivary glands, were actually performed by complementing the infective meal with L-Glutathion for this reason.

      Reviewer #3:

      The dogma in the Trypanosome field is that transmission by Tsetse flies is ensured by stumpy forms. This has been recently challenged by the Engstler lab (Schuster et al.), which showed that slender forms can also be transmitted by teneral flies. In this work, the authors aimed to test whether transmission by slender forms is possible and frequent.

      For this, the authors repeated Tsetse transmission experiments but with some key critical differences relative to Schuster et al. First, they infected teneral and adult flies. Second, their infective meals lacked two components (N-acetylglucosamine and glutathione), which could have boosted the infection rates in the Schuster et al. work. In these conditions, the authors observed that most stumpy form infections with teneral and adult flies were successful while only 1 out of 24 slender-form infections was successful. Adult flies showed a lower infection rate, which is probably because their immune system is more developed.

      Given that in Tsetse-infested areas most transmission is likely ensured by adult flies, the authors conclude that the parasite stage that will have a significant epidemiologic impact on transmission is the stumpy form.

      Strengths:

      • This work tackles an important question in the field.

      • The Rotureau laboratory has well-known expertise in Tsetse fly transmission experiments.

      • Experimental setup is robust and data is solid.

      • The paper is concise and clearly written.

      Thank you

      Weaknesses:

      • The reason(s) for why this work has lower infection rates with slender forms than Schuster et al. remain unknown. The authors suggested it could be because of the absence of N-acetylglucosamine and/or glutathione, but this was not formally tested. Could another source of variation be the clone of EATRO1125 AnTat1.1 (Paris versus Munich origin)? To reduce the workload, such additional experiments could be done with just one dose of parasites.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could indeed explain the differences in infection rates observed in the two studies. However, the main purpose of this study is to assess the intrinsic infectivity of SL Vs. ST in teneral Vs. adult flies. Our study was designed to stand alone for providing a clear answer to this question, not to reproduce the results obtained by Schuster et al.. Hence, we don’t think that any additional experiments are required here.

      • The characterization of what is slender and stumpy is critical. The authors used PAD1 protein expression as the sole reporter. While this is a robust assay to confirm stumpy, an analysis of the cell cycle would have been helpful to confirm that slender forms have not initiated differentiation (Larcombe S et al. 2023, preprint).

      In this study, ST are indeed defined by their general morphology and by the expression of PAD1 proteins at the cell membrane as assessed by IFA. This is the simplest and most accurate ST proxy accessible by IFA. We do not think that monitoring in more details the cell cycle would provide key information here. If some SL forms had initiated differentiation in our experiments, then, the low infection rates observed with SL would have reinforced the fact that mostly mature PAD1+ ST are infectious for flies .

      • Statistical analysis is missing. Is the difference between adult and teneral infections statistically significant?

      An ANOVA statistical analysis was performed and a dedicated section was added to the revised version.

      For all conditions, MG infection rate comparisons between adult and teneral flies were statistically significant.

      Recommenda8ons for the authors:

      Reviewer #1:

      While some perceived outcomes pertaining to immunological competence and transmission relevance of teneral flies are overstated, the overall tone of the paper is inappropriately apologe7c. The authors obviously don't want to offend their colleagues but the current wri7ng style obscures meaning, making the paper a bit 'flowery' and difficult to read.

      Ngoune et al. have important outcomes that need to be stated more directly.

      Words such as 'unequivocally' are not appropriate to Schuster et al's outcomes. As your study shows, their findings are experimentally based, with inherent caveats, and are therefore sugges7ve, not demonstrated or proven.

      The word 'unequivocally' has been removed from the revision.

      Reviewer #3:

      The Engstler lab cul7vates AntTaT1.1 in methylcellulose (Munich clone, if I am not mistaken). The Rotureau lab uses the Paris AntTaT1.1 clone and uses no methylcellulose. Given that methylcellulose helps stumpy forma7on, it seems important to show that the results of this paper are reproducible with the Munich clone grown in the presence of methylcellulose.

      Differences between the strain clones and culture conditions could indeed explain the differences in infection rates observed in the two studies. However, the main purpose of this study is to assess the intrinsic infectivity of SL Vs. ST in teneral Vs. adult flies. Our study was designed to stand alone for providing a clear answer to this question, not to reproduce the results obtained by Schuster et al.. Hence, we don’t think that any additional experiments are required here.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary of the reviewers’ discussion:

      • The development of MSI-1 as a post-transcriptional regulator of gene expression in Escherichia coli represents a valuable addition to the synthetic biology toolkit. MSI-1 has advantages over transcriptional regulators because it has the potential to target single genes in operons. Allosteric control of MSI-1 by oleic acid increases its versatility.

      Authors’ response: We thank the reviewers and editor for this evaluation.

      • We recommend that authors add experiments to test the mechanism of regulation by MSI-1 or soften their claims about translational regulation. We also recommend that the authors expand their discussion of other natural and synthetic regulatory systems that target translation.

      Authors’ response: In this revision, we have added new experimental results from RT-qPCR, bulk fluorometry, and flow cytometry assays to further support our conclusions. We have also enlarged the Introduction and Discussion.

      • Adding an experiment to quantify the effect of oleic acid with the most strongly regulated reporter construct (i.e., flow cytometry with redesign-3) would substantially increase the impact of the work.

      Authors’ response: We have done this experimental quantification (see the new Fig. 5d).

      Reviewer #1 (Public Review):

      The authors develop reporter constructs in E. coli where gene expression, presumably translation, is repressed by MSI-1. This is a potentially useful tool for synthetic biologists, with the advantage over transcriptional regulation that one gene in an operon could be targeted. That being said, an important caveat of translational regulation that is not addressed in the manuscript is the potential for downstream effects on RNA stability and/or transcription termination. The authors' MSI-1-regulated reporter constructs could also be useful for mechanistic studies of MSI-1.

      Authors’ response: We thank the reviewer for such appreciation of our work. Regarding the potential effects on RNA stability or transcription termination, we would like to highlight our results with the sfGFP-mScarlet bicistron (Fig. 6c), showing the specific regulation of sfGFP by MSI-1* and not of mScarlet. Anyway, for this revision we have conducted an RT-qPCR experiment to quantify the mRNA level of sfGFP to further support our conclusions (see the new Fig. S2).

      The author's initial construct design led to only weak regulation by MSI-1, presumably because the MSI-1 binding sites were not suitably positioned to repress translation initiation. A more rationally designed construct led to considerably greater repression. One weakness of the paper is that the authors did not use their redesigned construct that is more strongly repressed to demonstrate allosteric regulation by oleic acid using a comparable assay (e.g., flow cytometry) to that used in other experiments. The potential for allosteric regulation is a major strength of the MSI-1 system, so this is a significant gap. Similarly, the authors use the weakly regulated constructs to assess the effect of MSI-1 binding site mutations and for their mathematical modeling; these experiments would be better suited to the more strongly regulated construct.

      Authors’ response: For this revision, we have performed the flow cytometric quantification of the allosteric regulation by oleic acid in the redesigned-3 system (see the new Fig. 5d). Regarding the kinetic study, we focused on the reporter system with just one recognition motif for simplicity. A reporter system with two recognition motifs, thereby recruiting two different proteins, increases the complexity to distill the effect of point mutations.

      Reviewer #1 (Recommendations For The Authors):

      1. Figure 5. Panels c-f look at colonies on plates, with numbers from these data being difficult to compare with either the bulk fluorescence or single-cell fluorescence values shown in other figures. Supplementary Figure 8 shows data for single cells; these data would be more appropriate in Figure 5, with the plate-based data moving to the supplement. Moreover, measuring the effect of oleic acid on the redesign-3 reporter using flow cytometry would assess the impact of oleic acid on the most strongly regulated reporter; this would be the most impactful analysis.

      Authors’ response: We have redone Fig. 5 to include flow cytometry data (also for the system implemented with the redesign-3 reporter).

      1. Paragraph starting line 438. The authors should briefly discuss the potential for translational repression leading to reduced RNA stability, and in the case of rapid repression that impacts transcription-coupled translation, its impact on Rho-dependent transcription termination. These factors could alter the expression of neighboring genes.

      Authors’ response: As we have shown with the RT-qPCR experiment, the mRNA level of the target gene does not change in response to protein binding. We agree that mRNA stability could potentially be changed by using other RNA-targeting proteins. But in our view, a reduction of RNA stability is not a regulation of translation. We have added the following sentence in the Discussion: “The additional use of RNA-binding proteins able to alter mRNA stability might lead to the implementation of more complex circuits at the posttranscriptional level.”

      1. Figure 1. It would be informative to include a control where cells have an empty plasmid rather than a plasmid expressing MSI-1, to address leakiness of MSI-1 expression.

      Authors’ response: We have constructed a void plasmid as suggested and performed new bulk fluorometry assays. The new Fig. S8 shows the tight control of MSI-1* expression with the PLlac promoter. No apparent leakage is observed.

      1. Line 132. Where were the two sequences positioned with respect to each other than the start codon? It would be helpful to show the sequence in Figure 1.

      Authors’ response: The precise sequence is shown in the inset of Fig. 1b. The motif is placed just after the start codon.

      1. Line 135. The authors envisioned repression mechanism isn't clear from the text, specifically the meaning of "block the progression" and "initial phase". As far as I know, there is no precedent for RNA-binding proteins repressing translation in bacteria by preventing translation elongation. Presumably, repression in the context described here would be due to MSI-1 binding over the ribosome-binding site, although the predicted hairpin may also occlude binding of initiating 30S ribosomes in the absence of MSI-1 binding.

      Authors’ response: It is difficult to know the exact mode of action. In page 7, we have rewritten a sentence to have: “In this way, MSI-1* can repress translation by blocking the binding of the ribosome, presumably by imposing a steric hindrance for the 30S ribosomal subunit.”

      1. Figure 1e is overly complicated and hence is difficult to interpret. The key result is that mScarlet expression is unchanged as a function of lactose concentration. It is sufficient to show the inset graph as a supplementary figure panel and to conclude that regulation of sfGFP is at a post-transcriptional level. Similarly, the inset in Figure 4b is unnecessary.

      Authors’ response: The inset of Fig. 1e shows that the growth rate of the cells is almost constant when lactose varies. A change in growth rate will affect protein expression. The use of a two-reporter system, one regulated translationally and the other not, is instrumental to extract from fluorescence data estimates of transcription and translation rates. Of course, showing that mScarlet expression is almost constant when lactose varies would be sufficient, but we believe that performing a fine treatment of the data helps to better understand the regulatory system from a mathematical and mechanistic point of view. Therefore, despite increasing the complexity of the figure, we prefer to keep the representation of the Crick spaces (following Alon’s terminology, see our ref. 32). We have tried to carefully explain Fig. 1e in the text.

      1. Figure 1f and Figure 4c would be easier to interpret as two-dimensional plots.

      Authors’ response: We decided to use 3D plots to have more compact representations of the data in the main figures. The accompanying insets show the percentage of cells above the threshold, which helps to understand the regulatory effects. In any case, we have provided the corresponding 2D plots in Fig. S10.

      1. I don't think Figure 2e is relevant. The key result is shown in Figure 2f, i.e., the effect of mutations on regulation by MSI-1.

      Authors’ response: We agree with the reviewer that the key result is shown in panel f. However, we prefer to keep panel e in Fig. 2 because, even if negative, this result may incite further research. In addition, we avoid the rearrangement of the whole figure.

      1. Lines 311-313. Without additional evidence that the mutants are toxic, I suggest removing this text.

      Authors’ response: As suggested, we have removed that claim.

      Reviewer #2 (Public Review):

      Summary:

      Dolcemascolo and colleagues describe the use of the mammalian RNA-binding protein Musashi-1 (MSI-1) to implement translational regulation systems in E. coli. They perform detailed in vitro studies of MSI-1 and its binding to different RNA sequences. They provide compelling evidence of the effectiveness of the regulatory system in multiple circuits using different mRNA sequence motifs. They harness allosteric inhibition of MSI-1 by omega-9 monounsaturated fatty acids to demonstrate a fatty-acid-responsive circuit in E. coli.

      Strengths:

      The experimental results are compelling and the characterization of the binding between MSI-1 and different RNA sequences is thorough and performed via multiple complementary techniques. Several new useful circuit components are demonstrated.

      Authors’ response: We thank the reviewer for such appreciation of our work.

      Weaknesses:

      MSI-1 provides 8.6-fold downregulation of sfGFP with an optimized mRNA sequence. In some applications, a larger degree of repression may be required.

      Authors’ response: We agree with the reviewer in this point. We expect to conduct further research in the future to optimize the dynamic range of the system. We have added the following sentence in the Discussion: “Further work should be conducted to enhance the fold change of the regulatory module and engineer complex circuits with it.”

      Reviewer #2 (Recommendations For The Authors):

      Overall, I think this paper is very well done and quite thorough. I only have minor suggestions:

      • For Figures 1f and 4c, it is quite hard to interpret the fraction of cells above the threshold with the 3d perspective. It would be clearer to use a more standard 2d plot where the histograms are offset along the y-axis and the threshold is indicated by a vertical line.

      Authors’ response: We decided to use 3D plots to have more compact representations of the data in the main figures. The accompanying insets show the percentage of cells above the threshold, which helps to understand the regulatory effects. In any case, we have provided the corresponding 2D plots in Fig. S10.

      • For Figure 4b, the highlighting of different sequence regions in red3 appears to be offset by one base (e.g. AAU is highlighted rather than AUG).

      Authors’ response: This has been corrected.

      • For line 504, it seems that MSI-1 is used for two different proteins. A different name should be assigned to this 200-residue protein to avoid confusion with the other MSI-1.

      Authors’ response: We now use the term MSI-1h* for the human version of the protein.

      • The note (Page S12) that A_0 + A_R = alpha/delta only applies in steady-state conditions, which should be stated.

      Authors’ response: We have specified that.

      • It seems that some authors work for the companies that sell some of the instruments/consumables used for the assays, specifically switchSENSE and LigandTracer. This may be something that should be declared under Competing Interests for the paper.

      Authors’ response: We are sorry for having missed this point. We have included a Competing Interests section to state that “RAHR and WFV work for Dynamic Biosensors. GPR and JB work for Ridgeview Instruments”.

      Reviewer #3 (Public Review):

      Summary:

      In this work, the authors co-opt the RRM-binding protein Musashi-1 to act as a translational repressor. The novelty of the work is in the adoption of the allosteric RRM protein Musashi-1 into a translational reporter and the demonstration that RRM proteins, which are ubiquitous in eukaryotic systems, but rare in prokaryotic ones, may act effectively as post-translational regulators in E. coli. The extent of repression achieved by the best design presented in this work is not substantially improved compared to other synthetic regulatory schemes developed for E. coli, even those that similarly regulate translation (eg. native PP7 repression is approximately 10-fold, Lim et al. J. Biol. Chem. 2001 276:22507-22513). Furthermore, the mechanism of regulation is not established due to missing key experiments. The work would be of broader interest if the allosteric properties of Musashi-1 were more effective in the context of regulation. Unfortunately, the authors do not demonstrate that fatty acids can completely de-repress expression in the experimental system used for most of their assays, nor do they use this ability in their provided application (NIMPLY gate).

      Authors’ response: For this revision, we have performed the flow cytometric quantification of the allosteric regulation by oleic acid in the redesigned-3 system, showing substantial de-repression of the system with the biochemical compound. We have redone Fig. 5 and modified the Results section accordingly. Aligned with the reviewers and editor, we believe that this new result helps to improve our manuscript.

      Strengths:

      The first major achievement of this work is the demonstration that a eukaryotic RRM protein may be used to posttranscriptionally regulate expression in bacteria. In my limited literature search, this appears to be the first engineering attempt to design an RBP to directly regulate translation in E. coli, although engineered control of translation via other approaches including alterations to RNA structure or via trans-acting sRNAs have been previously described (for review see Vigar and Wieden Biochim Biophys. Acta Gen. Subj. 2017, 1861:3060-3069). Additionally, several viral systems (e.g. MS2 and PP7) have been directly co-opted to work in a similar fashion in the past (utilized recently in Nguyen et al. ACS Synthetic Biol 2022, 11:1710-1718).

      Authors’ response: We thank the reviewer for such appreciation of our work.

      The second achievement of this work is the demonstration that the allosteric regulation of Musashi-1 binding can be utilized to modulate the regulatory activity. However, the liquid culture demonstration (Suppl. Fig 8) shows that this is not a very effective switch, with de-repressed reporter activity showing substantial change but not approaching un-repressed activity. This effect is stronger when colonies are grown on a solid medium (Fig. 5).

      Authors’ response: As we have previously indicated, the flow cytometric quantification of the allosteric regulation by oleic acid in the redesigned-3 system in liquid culture showed substantial de-repression with the biochemical compound. It is now stated in the text the following: “Nevertheless, the system implemented with the redesign-3 reporter displayed a better dynamic behavior in response to lactose and oleic acid. In particular, the percentage of cells in the ON state increased from 0 (with 1 mM lactose) to 71% upon addition of 20 mM oleic acid (Fig. 5d).” This new result helps to improve our manuscript.

      Weaknesses:

      In this work, the authors codon optimize the mouse Musashi-1 coding sequence for expression in E. coli and demonstrate using an sfGFP reporter that an engineered Musashi-1 binding site near the translational start site is sufficient to enable a modest reduction in reporter gene expression. The authors postulate that the reduction in expression due to inhibition of ribosome translocation along the transcript (lines 134/135), as an expression of a control transcript (mScarlet) driven by the same promoter (Plac) but without the Musashi-1 recognition site does not demonstrate the same repression. However, the situation could be more complex. Other possibilities include inhibition of translation initiation rather than elongation, as well as accelerated mRNA decay of transcripts that are not actively translated. The authors do not present any measurements of sfGFP mRNA levels.

      Authors’ response: In page 7, we have rewritten a sentence to have: “In this way, MSI-1* can repress translation by blocking the binding of the ribosome, presumably by imposing a steric hindrance for the 30S ribosomal subunit.” In addition, for this revision we have conducted an RT-qPCR experiment to quantify the mRNA level of sfGFP to further support our conclusions (see the new Fig. S2). As shown, there is no change in the mRNA level upon inducing the system with lactose.

      In subsequent sections of the work, the authors create a series of point mutations to assess RNA-protein binding and assess these via both a sfGFP reporter and in vitro binding assays (switchSENSE). Ultimately, it is difficult to fully rationalize and interpret the behavior of these mutants in the context provided. The authors do identify a relationship between equilibrium constant (1/KD) and fold-repression. However, it is not clear from the narrative why this relationship should exist. Fold-repression is one measure of regulator efficacy, but it is an indirect measure determined from unrepressed and repressed expression. It is not clear why unrepressed expression (in the absence of the protein) is expected to be a function of the equilibrium constant.

      Authors’ response: A mathematical derivation from mass action kinetics on why the fold change scales with 1/KD is provided in Note S2. It is the ratio between the unrepressed and repressed expression (i.e., fold change) what scales with 1/KD, but not the expression of a particular state. This kind of relationship has been previously established in the case of transcription regulation [see e.g. Garcia & Phillips, PNAS (2011), our ref. 39]. Our mathematical modeling results expand previous work by providing a single picture from which to analyze transcription and translation regulation.

      Subsequent rational redesign of the Musashi-1 binding sequence to produce three alternative designs shows that fold-repression may be improved to approximately 8.6-fold. However, the rationalization of why the best design (red3) achieves this increase based on either the extensive modelling or in vitro measured binding constants is not well articulated. Furthermore, this extent of regulation is approximately that which can be achieved from the PP7 system with its native components (Lim et al. J. Biol. Chem. 2001 276:22507-22513).

      Authors’ response: In the case of translation control, the regulation is more challenging because the target is quickly degraded, especially in bacteria (in contrast to transcription control, where the target is stable). This is acknowledged in the manuscript. Even though, it is possible to engineer synthetic circuits with sRNAs or RNA-binding proteins with sufficient dynamic range. We expect to conduct further research in the future to optimize the dynamic range of the system. We have added the following sentence in the Discussion: “Further work should be conducted to enhance the fold change of the regulatory module and engineer complex circuits with it.” Regarding the articulation of the results for the mutants and mathematical model, see our responses in the following questions.

      The application provided for this regulator (NIMPLY gate), is not an inherently novel regulatory paradigm, and it does not capitalize on the allosteric properties of Musashi-1, but rather treats Musashi-1 as a non-allosteric component of a regulatory circuit.

      Authors’ response: The NIMPLY gate refers to lactose and aTC as inputs. Considering oleic acid as an additional input will lead to a more complex logic. In the last Results section, we wanted to show that the post-transcriptional mechanism engineered with Musashi-1 can be useful specifically regulate a gene within an operon, to implement combinatorial regulation (i.e., coupling transcription and translation control), and to reduce protein expression noise. To these ends, the allosteric ability of the Musashi-1 was not so determinant. In this regard, it would be true that such fine regulatory effects might be achieved as well with non-allosteric RNA-binding proteins, such as MS2CP or PP7CP.

      Reviewer #3 (Recommendations For The Authors):

      1. In the introduction the authors should adequately address the native bacterial mechanisms that allow posttranscriptional regulation in bacteria as well as better discuss previous examples of translational repressors.

      Authors’ response: We have added the following paragraph in the Introduction: “Even though bacteria do not appear to exploit proteins to regulate translation in a gene-specific manner, it is worth noting that some bacteriophages do follow this mechanism to modulate their infection cycle. These are the cases, e.g., of the coat proteins of the phages MS2 (infecting Escherichia coli) or PP7 (infecting Pseudomonas aeruginosa), which regulate the expression of the cognate phage replicases through protein-RNA interactions [18]. However, one limitation for synthetic biology developments is that such phage proteins are not allosteric. At the post-transcriptional level, bacteria mostly rely on a large palette of cis- and trans-acting non-coding RNAs to either activate or repress protein expression, resulting in the regulation of translation initiation, mRNA stability, or transcription termination, and even allowing sensing small molecules [1,15]. Thus, there should be efforts to replicate this functional versatility with proteins in bacteria.”

      1. Given the location of the Musashi-1 binding site in the sfGFP reporter, it may be blocking translation initiation, rather than blocking the progression of the ribosome once attached (line 134/135). The schematic in Fig 1a. is also not overly clear in describing the differences in mechanisms between eukaryotic and prokaryotic systems described in the text.

      Authors’ response: In page 7, we have rewritten a sentence to have: “In this way, MSI-1 can repress translation by blocking the binding of the ribosome, presumably by imposing a steric hindrance for the 30S ribosomal subunit.” In page 14, we have added the following sentence: “In this way, MSI-1 can also block the RNA component of the 30S ribosomal subunit.”

      1. The authors did not directly examine mRNA levels of their reporter to establish translational regulation. In many cases, inhibition of translation is accompanied by an increased degradation rate in bacterial systems. The authors do not seem to recognize this as a possible amplifier in their system, relying exclusively on normalization via another transcript produced from the same promoter (mScarlet).

      Authors’ response: For this revision we have conducted an RT-qPCR experiment to quantify the mRNA level of sfGFP to further support our conclusions (see the new Fig. S2). As shown, there is no change in the mRNA level upon inducing the system with lactose.

      1. The results presented for mutations 1-5 are not consistent with the author's models for what is occurring. In particular, mutant 1 displays a reduction in reporter production in the absence of Musashi-1, but the production in the presence does not change from the unaltered sequence. The claim that mutation 1 (in the UAG binding site) results in less binding and ultimately in less regulation is not substantiated since this loss of regulation is due to a reduction in unrepressed expression rather than an increase in expression when Musashi-1 is present.

      Authors’ response: We respectfully disagree with this appreciation. In the case of mutant 1, if the Musashi protein recognized the target mRNA with the same affinity as in the original scenario, the red bar would be much lower. Because the Musashi protein hardly recognizes the mutant-1 mRNA, the blue and red bars are quite similar. To clarify this point, we have added the following text in the manuscript: “Despite that mutation substantially reduced sfGFP expression in absence of MSI-1*, the presumed repressed state upon addition of lactose did not change much, suggesting the difficulty of the protein for targeting the mutated mRNA.”

      1. Given point 5 above, it is not clear to me why one would expect the 1/KD to be predictive fold-repression in the presence and absence of the repressor. I would rather see the relationship described as predictive in Fig. 2f (fold change vs. 1/KD) rather than the non-linear relationship. It is difficult to qualitatively evaluate the fit quality with the way the data are currently presented.

      Authors’ response: Note S2 provides a mathematical derivation from mass action kinetics on why the fold change scales with 1/KD. The R2 value that we provide for the fitting corresponds to the linear regression between fold and 1/KD, as specified in the figure legend. However, we think that the representation of fold vs. KD in log scale is more illustrative in this case.

      1. It is not clear what conclusion is determined from the computational modeling, or how this work contributes to the narrative presented. It does not seem like what is learned from these experiments is utilized for novel designs. Furthermore, several of the assumptions within the model may be problematic including the high rate of "elongation leakage" described and the lack of justification for RNA degradation rates utilized.

      Authors’ response: The mathematical modeling was performed to rationalize our experimental data. Our idea was more to recapitulate the observed dynamics than to guide the design of new systems. Our model might be exploited to this end in further research, as the reviewer suggests. Besides, elongation leakage is a concept that applies to both transcription and translation regulation systems, and it is not more than the ability of the RNA polymerase or ribosome to elongate even if there is a protein bound to the nucleic acid. This parameter can be set to 0 in the model if appropriate. Moreover, we cite the paper by Bernstein et al., PNAS (2002), our ref. 38, to justify that in E. coli the average mRNA half-life is about 5 min (i.e., degradation rate of 0.14 min-1).

      1. The data presented in Figure 4 are not presented in a consistent way. While it would be somewhat redundant, including the 0 and 1 mM lactose data for red3 in Figure 4a would be helpful for comparison purposes.

      Authors’ response: We have added the requested bar plot in Fig. 4a.

      1. The presence of additional Musashi-1 sites upstream of the start codon in red3, and their impact on impact on the fold-repression may support an inhibition of the translation initiation model rather than an inhibition of elongation.

      Authors’ response: In page 7, we have rewritten a sentence to have: “In this way, MSI-1 can repress translation by blocking the binding of the ribosome, presumably by imposing a steric hindrance for the 30S ribosomal subunit.” In page 14, we have added the following sentence: “In this way, MSI-1 can also block the RNA component of the 30S ribosomal subunit.”

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to address a critical challenge in the field of bioinformatics: the accurate and efficient identification of protein binding sites from sequences. Their work seeks to overcome the limitations of current methods, which largely depend on multiple sequence alignments or experimental protein structures, by introducing GPSite, a multi-task network designed to predict binding residues of various molecules on proteins using ESMFold.

      Strengths:

      1. Benchmarking. The authors provide a comprehensive benchmark against multiple methods, showcasing the performances of a large number of methods in various scenarios.

      2. Accessibility and Ease of Use. GPSite is highlighted as a freely accessible tool with user-friendly features on its website, enhancing its potential for widespread adoption in the research community.

      We thank the reviewer for acknowledging the contributions and strengths of our work! Weaknesses:

      1. Lack of Novelty. The method primarily combines existing approaches and lacks significant technical innovation. This raises concerns about the original contribution of the work in terms of methodological development. Moreover, the paper reproduces results and analyses already presented in previous literature, without providing novel analysis or interpretation. This further diminishes the contribution of this paper to advancing knowledge in the field.

      The novelty of this work is primarily manifested in four key aspects. Firstly, although we agree with the reviewer that we did employ several existing tools such as ProtTrans and ESMFold to extract sequence features and predict protein conformations, these techniques were hardly explored in the field of binding site prediction. We have successfully demonstrated the feasibility of substituting multiple sequence alignments with language model embeddings and training with “less accurate” predictive structures, providing a new solution to overcome the limitations of current methods for genome-wide applications. Secondly, though a few methods tend to capture geometric information based on protein surfaces or atom graphs, surface calculation and property mapping are usually time-consuming, while massage passing on full atom graphs is memory-consuming and thus challenging to process long sequences. Besides, these methods are sensitive towards details and errors in the predictive structures. To facilitate large-scale annotations, we have innovatively applied geometric deep learning to protein residue graphs for comprehensively capturing backbone and sidechain geometric contexts in an efficient and effective manner (Figure 1). Thirdly, we have not only exploited multi-task learning to integrate diverse ligands and enhance performance, but also shown its capability to easily extend to the binding site prediction of other unseen ligands (Figure 4 D-E). Last but not least, as a Tools and Resources article, we have provided a fast, accurate and user-friendly webserver, as well as constructed a large annotation database for the sequences in Swiss-Prot. Leveraging this database, we have conducted extensive analyses on the associations between binding sites and molecular functions, biological processes, and disease-causing mutations (Figure 5), indicating the potential of our tool to unveil unexplored biology underlying genomic data.

      1. Benchmark Discrepancies. The variation in benchmark results, especially between initial comparisons and those with PeSTo. GPSite achieves a PR AUC of 0.484 on the global benchmark but a PR AUC of 0.61 on the benchmark against PeSTo. For consistency, PeSTo should be included in the benchmark against all other methods. It suggests potential issues with the benchmark set or the stability of the method. This inconsistency needs to be addressed to validate the reliability of the results.

      We thank the reviewer for the constructive comments. Since our performance comparison experiments involved numerous competitive methods whose training sets were disparate, it was difficult to compare or rank all these methods fairly using a single test set. As described in the “GPSite outperforms state-of-the-art methods” section, 358 out of 375 proteins in our protein-protein binding site test set share >30% sequence identity with the training sequences of PeSTo. To address this, we meticulously re-split our entire protein-protein binding site dataset to generate a new test set that avoids any overlap with the training sets of both GPSite and PeSTo and performed a separate evaluation. This is quite common in this field. For instance, in the study of PeSTo [Nat Commun 2023], the comparisons of PeSTo with MaSIF-site, SPPIDER, and PSIVER were conducted using one test set, while the comparison with ScanNet was performed on a separate test set. Based on the reviewer’s suggestion, in the revised version of the manuscript, we intend to include other comparative methods alongside PeSTo on the new test set or retrain our model directly on PeSTo's training set for comparison, which should enhance the completeness of our results.

      1. Interface Definition Ambiguity. There is a lack of clarity in defining the interface for the binding site predictions. Different methods are trained using varying criteria (surfaces in MaSIF-site, distance thresholds in ScanNet). The authors do not adequately address how GPSite's definition aligns with or differs from these standards and how this issue was addressed. It could indicate that the comparison of those methods is unreliable and unfair.

      We thank the reviewer for the comments. The precise definition of ligand-binding sites is elucidated in the “Benchmark datasets” section. Specifically, the datasets of DNA, RNA, peptide, ATP, HEM and metal ions used to train GPSite were collected from the widely acknowledged BioLiP database [PMID: 23087378]. In BioLiP, a binding residue is defined if the smallest atomic distance between the target residue and the ligand is <0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms. In the meanwhile, most comparative methods regarding these ligands were also trained on data from BioLiP, thereby ensuring fair comparisons.

      However, since BioLiP does not include data on protein-protein binding sites, studies for protein-protein binding site prediction may adopt slightly distinct label definitions, as the reviewer suggested. Here, we employed protein-protein binding site data from our previous study [PMID: 34498061], where a protein-binding residue was defined as a surface residue (relative solvent accessibility > 5%) that lost more than 1 Å2 absolute solvent accessibility after protein-protein complex formation. This definition was initially introduced in PSIVER [PMID: 20529890] and widely applied in various studies (e.g., PMID: 31593229, PMID: 32840562). SPPIDER [PMID: 17152079] and MaSIF-site [PMID: 31819266] have also adopted similar surface-based definitions as PSIVER. On the other hand, ScanNet [PMID: 35637310] employed an atom distance threshold of 4 Å to define contacts while PeSTo [PMID: 37072397] used a threshold of 5 Å. However, it is noteworthy that current methods in this field including ScanNet [Nat Methods 2022] and PeSTo [Nat Commun 2023] directly compared methods using different label definitions without any alignment in their benchmark studies, likely due to the subtle distinctions among these definitions. For instance, the study of PeSTo directly performed comparisons with ScanNet, MaSIF-site, SPPIDER, and PSIVER. Therefore, we followed these previous works, directly comparing GPSite with other protein-protein binding site predictors. In our revised manuscript, we will provide more details for the binding site definitions to avoid any potential ambiguity.

      While GPSite demonstrates the potential to surpass state-of-the-art methods in protein binding site prediction, the evidence supporting these claims seems incomplete. The lack of methodological novelty and the unresolved questions in benchmark consistency and interface definition somewhat undermine the confidence in the results. Therefore, it's not entirely clear if the authors have fully achieved their aims as outlined.

      The work is useful for the field, especially in disease mechanism elucidation and novel drug design. The availability of genome-scale binding residue annotations GPSite offers is a significant advancement. However, the utility of this tool could be hampered by the aforementioned weaknesses unless they are adequately addressed.

      We thank the reviewer for acknowledging the advancement and value of our work, as well as pointing out areas where improvements can be made. As discussed above, we will carry out the corresponding revisions in the next version of the manuscript to enhance the completeness and clearness of our work.

      Reviewer #2 (Public Review):

      Summary:

      This work provides a new framework, "GPsite" to predict DNA, RNA, peptide, protein, ATP, HEM, and metal ions binding sites on proteins. This framework comes with a webserver and a database of annotations. The core of the model is a Geometric featurizer neural network that predicts the binding sites of a protein. One major contribution of the authors is the fact that they feed this neural network with predicted structure from ESMFold for training and prediction (instead of native structure in similar works) and a high-quality protein Language Model representation. The other major contribution is that it provides the public with a new light framework to predict protein-ligand interactions for a broad range of ligands.

      The authors have demonstrated the interest of their framework with mostly two techniques: ablation and benchmark.

      Strengths:

      The performance of this framework as well as the provided dataset and web server make it useful to conduct studies.

      The ablations of some core elements of the method, such as the protein Language Model part, or the input structure are very insightful and can help convince the reader that every part of the framework is necessary. This could also guide further developments in the field. As such, the presentation of this part of the work can hold a more critical place in this work.

      We thank the reviewer for recognizing the contributions of our work and for noting that our experiments are thorough.

      Weaknesses:

      Overall, we can acknowledge the important effort of the authors to compare their work to other similar frameworks. Yet, the lack of homogeneity of training methods and data from one work to the other makes the comparison slightly unconvincing, as the authors pointed out. Overall, the paper puts significant effort into convincing the reader that the method is beating the state of the art. Maybe, there are other aspects that could be more interesting to insist on (usability, interest in protein engineering, and theoretical works).

      We sincerely appreciate the reviewer for the constructive and insightful comments. As to the concern of training data heterogeneity raised by the reviewer, it is noteworthy that current studies in this field, such as ScanNet [Nat Methods 2022] and PeSTo [Nat Commun 2023], tend to directly compare methods trained on different datasets in their benchmark experiments. Therefore, we have adhered to the paradigm in these previous works. According to the detailed recommendations by the reviewer, we will improve our manuscript by incorporating additional ablation studies regarding the effects of predicted structures and language model representations. Besides, we will refine the Discussion section to focus more on the achievements of this work and its potential applications including protein engineering. A comprehensive point-by-point response to the reviewer’s recommendations will be provided alongside the revised manuscript. This will ensure that all concerns and suggestions are adequately addressed.

      Reviewer #3 (Public Review):

      Summary

      The authors of this work aim to address the challenge of accurately and efficiently identifying protein binding sites from sequences. They recognize that the limitations of current methods, including reliance on multiple sequence alignments or experimental protein structure, and the under-explored geometry of the structure, which limit the performance and genome-scale applications. The authors have developed a multi-task network called GPSite that predicts binding residues for a range of biologically relevant molecules, including DNA, RNA, peptides, proteins, ATP, HEM, and metal ions, using a combination of sequence embeddings from protein language models and ESMFold-predicted structures. Their approach attempts to extract residual and relational geometric contexts in an end-to-end manner, surpassing current sequence-based and structure-based methods.

      Strengths

      1. The GPSite model's ability to predict binding sites for a wide variety of molecules, including DNA, RNA, peptides, and various metal ions.

      2. Based on the presented results, GPSite outperforms state-of-the-art methods in several benchmark datasets.

      3. GPSite adopts predicted structures instead of native structures as input, enabling the model to be applied to a wider range of scenarios where native structures are rare.

      4. The authors emphasize the low computational cost of GPSite, which enables rapid genome-scale binding residue annotations, indicating the model's potential for large-scale applications.

      We thank the reviewer for recognizing the significance and value of our work!

      Weaknesses

      1. One major advantage of GPSite, as claimed by the authors, is its efficiency. Although the manuscript mentioned that the inference takes about 5 hours for all datasets, it remains unclear how much improvement GPSite can offer compared with existing methods. A more detailed benchmark comparison of running time against other methods is recommended (including the running time of different components, since some methods like GPSite use predicted structures while some use native structures).

      We thank the reviewer for the valuable suggestion. Empirically, it takes about 30 min for existing MSA-based methods to make predictions for a protein with 500 residues, while it only takes less than 1 min for GPSite (including structure prediction). However, it is worth noting that some predictors in our benchmark study are solely available as webservers, and it is challenging to compare the runtime between a standalone program and a webserver due to the disparity in hardware configurations. Therefore, we will include comprehensive runtime comparisons between the GPSite webserver and other existing servers in the revision to illustrate the practicality and efficiency of our method.

      1. Since the model uses predicted protein structure, the authors have conducted some studies on the effect of the predicted structure's quality. However, only the 0.7 threshold was used. A more comprehensive analysis with several different thresholds is recommended.

      We thank the reviewer for the comment. We assessed the effect of the predicted structure's quality by evaluating GPSite’s performance on high-quality (TM-score > 0.7) and low-quality (TM-score ≤ 0.7) predicted structures. We did not employ multiple thresholds (e.g., 0.3, 0.5, and 0.7), as the majority of proteins in the test sets were accurately predicted by ESMFold. Specifically, as shown in Figure 3B, Appendix 3-figure 2 and Appendix 2-table 5, the numbers of proteins with TM-score ≤ 0.7 are small in most datasets. Consequently, there is insufficient data available for analysis with lower thresholds, except for the RNA test set. Notably, Figure 3C presents a detailed inspection of the proteins with TM-score < 0.5 in the RNA test set. Within this subset, GPSite consistently outperforms the state-of-the-art structure-based method GraphBind with predicted structures as input, regardless of the prediction quality of ESMFold. Only in cases where structures are predicted with extremely low quality (TM-score < 0.3) does GPSite fall behind GraphBind input with native structures. This result further demonstrates the robustness of GPSite.

      1. To demonstrate the robustness of GPSite, the authors performed a case study on human GR containing two zinc fingers, where the predicted structure is not perfect. The analysis could benefit from more a detailed explanation of why the model can still infer the binding site correctly even though the input structural information is slightly off.

      We thank the reviewer for the comment. We have actually explained the potential reason for the robustness of GPSite in the second paragraph of the “GPSite is robust for low-quality predicted structures” section. In summary, although the whole structure of this protein is not perfectly predicted, the binding domains of peptide, DNA and Zn2+ are actually predicted accurately as evidenced by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can still make reliable predictions.

      1. To analyze the relatively low AUC value for protein-protein interactions, the authors claimed that it is "due to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete", which is unjustified. It is highly recommended to support this claim by showing at least one example where GPSite's prediction is a valid binding site that is not present in the current Swiss-Prot database or via other approaches.

      We thank the reviewer for the valuable recommendation. We will perform such analysis in the revised manuscript.

      1. The authors reported that many GPSite-predicted binding sites are associated with known biological functions. Notably, for RNA-binding sites, there is a significantly higher proportion of translation-related binding sites. The analysis could benefit from a further investigation into this observation, such as the analyzing the percentage of such interactions in the training site. In addition, if there is sufficient data, it would also be interesting to see the cross-interaction-type performance of the proposed model, e.g., train the model on a dataset excluding specific binding sites and test its performance on that class of interactions.

      We thank the reviewer for the suggestion. We would like to clarify that the analysis in Figure 5C was conducted at “protein-level” instead of “residue-level”. As described in the second paragraph of the “Large-scale binding site annotation for Swiss-Prot” section, a protein-level ligand-binding score was assigned to a protein by averaging the top k residue-level predictive binding scores. This protein-level score indicates the overall binding propensity of the protein to a specific ligand. We gathered the top 20,000 proteins with the highest protein-level binding scores for each ligand and found that their biological process annotations from Swiss-Prot were consistent with existing knowledge.

      As for the cross-interaction-type performance raised by the reviewer, we will include such analysis in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We would like to thank the reviewers for their feedback. Below we address their comments and have indicated the associated changes in our point-by-point response (blue: answers, red: changes in manuscript).

      Reviewer #1:

      Overall, the hypotheses and results are clearly presented and supported by high quality figures. The study is presented in a didactic way, making it easy for a broad audience to understand the significance of the results. The study does present some weaknesses that could easily be addressed by the authors.

      We thank the reviewer for appreciating our work and providing useful suggestions for improvement.

      1) First, there are some anatomical inaccuracies: line 129 and fig1C, the authors omit m.dial septum projections to area CA1 (in addition to the entorhinal cortex). Moreover, in addition to CA1, CA3 also provides monosynaptic feedback projections to the medial septum CA3. Finally, an indirect projection from CA1/3 excitatory neurons to the lateral septum, which in turn sends inhibitory projections to the medial septum could be included or mentioned by the authors. This could be of particular relevance to support claims related to effects of neurostimulations, whereby minutious implementation of anatomical data could be key.

      If not updating their model, the authors could add this point to their limitation section, where they already do a good job of mentioning some limitations of using the EC as a sole oscillatory input to CA1.

      We acknowledge that our current model strongly simplifies the interconnections between the medial septum and the hippocampal formation, but including more anatomical details is beyond the scope of this manuscript and would be a topic for future work. Nevertheless, we followed the reviewer’s advice to stress this point in our manuscript. First, we moved a paragraph that was initially in the “methods” section to the “results” section (L.141-150 of the revised manuscript):

      “Biologically, GABAergic neurons from the medial septum project to the EC, CA3, and CA1 fields of the hippocampus (Toth et al., 1993; Hajós et al., 2004; Manseau et al., 2008; Hangya et al., 2009; Unal et al., 2015; Müller and Remy, 2018). Although the respective roles of these different projections are not fully understood, previous computational studies have suggested that the direct projection from the medial septum to CA1 is not essential for the production of theta in CA1 microcircuits (Mysin et al., 2019). Since our modeling of the medial septum is only used to generate a dynamic theta rhythm, we opted for a simplified representation where the medial septum projects only to the EC, which in turn drives the different fields of the hippocampus. In our model, Kuramoto oscillators are therefore connected to the EC neurons and they receive projections from CA1 neurons (see methods for more details).”

      Second, we expanded the corresponding paragraph in the limitation section to discuss this point further (L.398-415 of the revised manuscript):

      “We decided to model septal pacemaker neurons projecting to the EC as the main source of hippocampal theta as reported in multiple experimental studies (Buzsáki, 2002; Buzsáki et al., 2003; Hangya et al., 2009). However, experimental findings and previous models have also proposed that direct septal inputs are not essential for theta generation (Wang, 2002; Colgin et al., 2013; Mysin et al., 2019), but play an important role in phase synchronization of hippocampal neurons. Furthermore, the model does not account for the connections between the lateral and medial septum and the hippocampus (Takeuchi et al., 2021). These connections include the inhibitory projections from the lateral to the medial septum and the monosynaptic projections from the hippocampal CA3 field to the lateral septum. An experimental study has highlighted the importance of the lateral septum in regulating the hippocampal theta rhythm (Bender et al., 2015), an area that has not been included in the model. Specifically, theta-rhythmic optogenetic stimulation of the axonal projections from the lateral septum to the hippocampus was shown to entrain theta oscillations and lead to behavioral changes during exploration in transgenic mice. To account for these discrepancies, our model could be extended by considering more realistic connectivity patterns between the medial / lateral septum and the hippocampal formation, including glutamatergic, cholinergic, and GABAergic reciprocal connections (Müller and Remy, 2018), or by considering multiple sets of oscillators each representing one theta generator.”

      1. The authors test conditions of low theta inputs, which they liken to pathological states (line 112). It is not clear what pathology the authors are referring to, especially since a large amount of 'oscillopathies' in the septohippocampal system are associated with decreased gamma/PAC, but not theta oscillations (e.g. Alzheimer's disease conditions).

      In the manuscript, we referred to “oscillopathies” in a broad sense way as we did not want to overstate the biological implications of the model or the way we modeled pathological states. To our knowledge, several studies have yielded inconsistent results regarding the specific changes in theta or gamma power in Alzheimer’s disease, and the most convincing alteration seems to be the theta-gamma phase-amplitude coupling (PAC) (for review see e.g., Kitchigina, V. F. Alterations of Coherent Theta and Gamma Network Oscillations as an Early Biomarker of Temporal Lobe Epilepsy and Alzheimer’s Disease. Front Integr Neurosci 12, 36 (2018)), as also mentioned by the reviewer.

      In this study, the most straightforward way to reduce theta-gamma PAC was to reduce the amplitude of the oscillators’ gain, which affected theta power, gamma power, and theta-gamma PAC (Figure 5 of the revised manuscript). Affecting their synchronization level (i.e., the order parameter) did not affect any of these variables (Figure 5 – Figure Supplement 4).

      In order to alter theta-gamma PAC without affecting theta or gamma power, we believe that more complex changes should be performed in the model, likely at the level of individual neurons in the hippocampal formation. For example, cholinergic deprivation has been previously used in a multi-compartment model of the hippocampal CA3 to mimic Alzheimer’s disease and to draw functional implications on the slowing of theta oscillations and the storage of new information (Menschik, E. D. & Finkel, L. H. Neuromodulatory control of hippocampal function: towards a model of Alzheimer’s disease. Artif Intell Med 13, 99–121 (1998)).

      This has now been added to the limitations section (L.458-465 of the revised manuscript):

      “Finally, we likened conditions of low theta input to pathological states characteristic of oscillopathies such as Alzheimer’s disease, as these conditions disrupted all aspects of theta-gamma oscillations in our model: theta power, gamma power, and theta-gamma PAC (Figure 5). However, it should be noted that changes in theta or gamma power in these pathologies are often unclear, and that the most consistent alteration that has been reported in Alzheimer’s disease is a reduction of theta-gamma PAC (for review, see Kitchigina, 2018). Future work should explore the effects of cellular alterations intrinsic to the hippocampal formation and their impact on theta-gamma oscillations.”

      1. While relevant for the clinical field, there is overall a missed opportunity to explain many experimental accounts with this novel model. Although to this day, clinical use of DBS is mostly restricted to electrical (and thus cell-type agnostic) stimulation, recent studies focusing on mechanisms of neurostimulations have manipulated specific subtypes in the medial septum and observed effects on hippocampal oscillations (e.g. see Muller & Remy, 2017 for review). Focusing stimulations in CA1 is of course relevant for clinical studies but testing mechanistic hypotheses by focusing stimulation on specific cell types could be highly informative. For instance, could the author reproduce recent optogenetic studies (e.g. Bender et al. 2015 for stimulation of fornix fibers; Etter et al., 2019 & Zutshi et al. 2018 for stimulation of septal inhibitory neurons)? Cell specific manipulations should at least be discussed by the authors.

      We acknowledge the importance of cell-type-specific manipulation in the septo-hippocampal circuitry. However, our model was designed to study neurostimulation protocols that affect the hippocampal formation, not the medial septum, which is why only the hippocampal formation is composed of biophysically realistic (i.e., conductance-based) neuronal models. To replicate the various studies mentioned by the reviewer (which are all very relevant), we would need to implement a biophysical model of the medial septum, which would be an entirely new project.

      Nevertheless, we can use the existing model to replicate optogenetic studies that induced gamma oscillations in excitatory-inhibitory circuits, using either ramped photostimulation targeting excitatory neurons (Adesnik et al., 2010; Akam et al., 2012; Lu et al., 2015), or pulsed stimulation driving inhibitory cells in the gamma range (Cardin et al., 2009; Iaccarino et al., 2016). In fact, such approaches have been demonstrated not just in the hippocampus but also in the neocortex, and represent a hallmark of local excitatory-inhibitory circuits. To account for these experimental results and replicate them, we have added 4 new figures (Figure 2 and its 3 figure supplements) and an extensive section in the results part (L.151-217 of the revised manuscript):

      “From a conceptual point of view, our model is thus composed of excitatory-inhibitory (E-I) circuits connected in series, with a feedback loop going through a population of coupled phase oscillators. In the next sections, we first describe the generation of gamma oscillations by individual E-I circuits (Figure 2), and illustrate their behavior when driven by an oscillatory input such as theta oscillations (Figure 3). We then present a thorough characterization of the effects of theta input and stimulation amplitude on theta-nested gamma oscillations (Figure 4 and Figure 5). Finally, we present some results on the effects of neurostimulation protocols for restoring theta-nested gamma oscillations in pathological states (Figure 6 and Figure 7).

      Generation of gamma oscillations by E-I circuits

      It is well-established that a network of interconnected pyramidal neurons and interneurons can give rise to oscillations in the gamma range, a mechanism termed pyramidal-interneuronal network gamma (PING) (Traub et al., 2004; Onslow et al., 2014; Segneri et al., 2020;). This mechanism has been observed in several optogenetic studies with gradually increasing light intensity (i.e., under a ramp input) affecting multiple different circuits, such as layer 2-3 pyramidal neurons of the mouse somatosensory cortex (Adesnik et al., 2010), the CA3 field of the hippocampus in rat in vitro slices (Akam et al., 2012), and in the non-human primate motor cortex (Lu et al., 2015). In all cases, gamma oscillations emerged above a certain threshold in terms of photostimulation intensity, and the frequency of these oscillations was either stable or slightly increased when increasing the intensity further. We sought to replicate these findings with our elementary E-I circuits composed of single-compartment conductance-based neurons driven by a ramping input current (Figure 2 and Figure S2). As an example, all the results in this section will be shown for an E-I circuit that has similar connectivity parameters as the CA1 field of the hippocampus in our complete model (see section “Hippocampal formation: inputs and connectivity” in the methods).

      For low input currents provided to both neuronal populations, only the highly-excitable interneurons were activated (Figure 2A). For a sufficiently high input current (i.e., a strong input that could overcome the inhibition from the fast-spiking interneurons), the pyramidal neurons started spiking as well. As the amplitude of the input increased, the activity of the both neuronal populations became synchronized in the gamma range, asymptotically reaching a frequency of about 60 Hz (Figure 2A bottom panel). Decoupling the populations led to the abolition of gamma oscillations (Figure 2B), as neuronal activity was determined solely by the intrinsic properties of each cell. Interestingly, when the ramp input was provided solely to the excitatory population, we observed that the activity of the pyramidal neurons preceded the activity of the inhibitory neurons, while still preserving the emergence of gamma oscillations (Figure S2 A). As expected, decoupling the populations also abolished gamma oscillations, with the excitatory neurons spiking a frequency determined by their intrinsic properties and the inhibitory population remaining silent (Figure S2B).

      To further characterize the intrinsic properties of individual inhibitory and excitatory neurons, we derived their input-frequency (I-F) curves, which represent the firing rate of individual neurons in response to a tonic input (Figure S3A). We observed that for certain input amplitudes, the firing rates of both types of neurons was within the gamma range. Interestingly, in the absence of noise, each population could generate by itself gamma oscillations that were purely driven by the input and determined by the intrinsic properties of the neurons (Figure S3B). Adding stochastic Gaussian noise in the membrane potential disrupted these artificial oscillations in decoupled populations (Figure S3C). All subsequent simulations were run with similar noise levels to prevent the emergence of artificial gamma oscillations.

      Another potent way to induce gamma oscillations is to drive fast-spiking inhibitory neurons using pulsed optogenetic stimulation at gamma frequencies, a strategy that has been used both in the neocortex (Cardin et al., 2009) and hippocampal CA1 (Iaccarino et al., 2016). In particular, Cardin and colleagues systematically investigated the effect of driving either excitatory or fast-spiking inhibitory neocortical neurons at frequencies between 10 and 200 Hz (Cardin et al., 2009). They showed that fast-spiking interneurons are preferentially entrained around 40-50 Hz, while excitatory neurons respond better to lower frequencies. To verify the behavior of our model against these experimental data, we simulated pulsed optogenetic stimulation as an intracellular current provided to our reduced model of a single E-I circuit. Stimulation was applied at frequencies between 10 and 200 Hz to excitatory cells only, to inhibitory cells only, or to both at the same time (Figure S4). The population firing rates were used as a proxy for the local field potentials (LFP), and we computed the relative power in a 10-Hz band centered around the stimulation frequency, similarly to the method proposed in (Cardin et al., 2009). When presented with continuous stimulation across a range of frequencies in the gamma range, interneurons showed the greatest degree of gamma power modulation (Figure S4). Furthermore, when the stimulation was delivered to the excitatory population, the relative power around the stimulation frequency dropped significantly in frequencies above 10 Hz, similar to the reported experimental data (Cardin et al., 2009). The main difference between our simulation results and these experimental data is the specific frequencies at which fast-spiking interneurons showed resonance, which was slow gamma around 40 Hz in the mouse barrel cortex and fast gamma around 90 Hz in our model. This could be attributed to several factors, such as differences in the cellular properties between cortical and hippocampal fast-spiking interneurons, or the differences between the size of the populations and their relevant connectivity in the cortex and the hippocampus.”

      Author response image 1.

      Figure 2. Emergence of gamma oscillations in coupled excitatory-inhibitory populations under ramping input to both populations. A. Two coupled populations of excitatory pyramidal neurons (NE = 1000) and inhibitory interneurons (NI = 100) are driven by a ramping current input (0 nA to 1 nA) for 5 s. As the input becomes stronger, oscillations start to emerge (shaded green area), driven by the interactions between excitatory and inhibitory populations. The green inset shows the raster plot (neuronal spikes across time) of the two populations during the green shaded period (red for inhibitory; blue for excitatory). When the input becomes sufficiently strong (shaded magenta area), the populations become highly synchronized and produce oscillations in the gamma range (at approximately 50 Hz). The spectrogram (bottom panel) shows the power of the instantaneous firing rate of the pyramidal population as a function of time and frequency. It reveals the presence of gamma oscillations that emerge around 2s and increase in frequency until 4 s, when they settle at approximately 60 Hz. B. Similar depiction as in panel A. with the pyramidal-interneuronal populations decoupled. The absence of coupling leads to the abolition of gamma oscillations, each cell spiking activity being driven by its own inputs and intrinsic properties.

      Author response image 2.

      Figure S2 (Figure 2 – Figure Supplement 1). Emergence of gamma oscillations in coupled excitatoryinhibitory populations under ramping input to the excitatory population. Similar representation as in Figure 2, but with the input provided only to the excitatory population. All conclusions remain the same. In addition, the inhibitory population does not show any spiking activity in the decoupled case.

      Author response image 3.

      Figure S3 (Figure 2 – Figure Supplement 2). Cell-intrinsic spiking activity in decoupled excitatory and inhibitory populations under ramping input. A. Input-Frequency (I-F) curves for excitatory cells (left panel; pyramidal neurons with ICAN) and inhibitory cells (right panel; interneurons, fast-spiking) used in the model. Above a certain tonic input (around 0.35 nA for excitatory and 0.1 nA for inhibitory neurons), neurons can spike in the gamma range. B. Raster plot showing the spiking activity of excitatory (blue, NE = 1000) and inhibitory (red, NI = 100) neurons in decoupled populations under ramping input (top trace) and in the absence of noise in the membrane potential. Despite random initial conditions across neurons, oscillations emerge in both populations due to the intrinsic properties of the cells, with a frequency that is predicted by the respective I-F curves (panel A.). C. Similar representation as panel B. but with the addition of stochastic noise in the membrane potential of each neuron. The presence of noise disrupts the emergence of oscillations in these decoupled populations.

      Author response image 4.

      Figure S3 (Figure 2 – Figure Supplement 2). Cell-intrinsic spiking activity in decoupled excitatory and inhibitory populations under ramping input. A. Input-Frequency (I-F) curves for excitatory cells (left panel; pyramidal neurons with ICAN) and inhibitory cells (right panel; interneurons, fast-spiking) used in the model. Above a certain tonic input (around 0.35 nA for excitatory and 0.1 nA for inhibitory neurons), neurons can spike in the gamma range. B. Raster plot showing the spiking activity of excitatory (blue, NE = 1000) and inhibitory (red, NI = 100) neurons in decoupled populations under ramping input (top trace) and in the absence of noise in the membrane potential. Despite random initial conditions across neurons, oscillations emerge in both populations due to the intrinsic properties of the cells, with a frequency that is predicted by the respective I-F curves (panel A.). C. Similar representation as panel B. but with the addition of stochastic noise in the membrane potential of each neuron. The presence of noise disrupts the emergence of oscillations in these decoupled populations.

      Beyond these weaknesses, this study has a strong utility for researchers wanting to explore hypotheses in the field of neurostimulations. In particular, I see value in such models for exploring more intricate, phase specific effects of continuous, as well as close loop stimulations which are on the rise in systems neuroscience.

      We thank the reviewer for this appreciation of our work and its future perspectives.

      Recommendations For The Authors:

      Line 144, the authors mention that their MI values are erroneous in absence of additive noise - could this be due to the non-sinusoidal nature of the phase signal recorded, and be fixed by upscaling model size?

      We thank the reviewer for this question and suggestion. The main reason behind the errors in the computation of the MI lies in the complete absence of oscillations at specific frequencies. Filtered signals within specific bands produced a power of 0 (or extremely low values), as seen in the power spectral densities. In such cases, the phase signal was not mathematically defined, but the toolbox we used to compute it still returned a numerical result that was inaccurate (for more details on the computation of the MI see Tort et al., 2010). To mitigate this numerical artefact, we decided to add uniform noise in the computed firing rates. This strategy is illustrated on Figure S6 (Figure 3 – Figure Supplement 2), which we have copied below for reference. Alternative approaches could probably have been used, such as increasing the noise in the membrane potential so that neurons would start spiking with firing rates that show more realistic power spectra, even in the absence of external inputs.

      Author response image 5.

      Figure S6 (Figure 3 – Figure Supplement 2). Quantification of PAC with and without noise. A. Quantifying PAC in the absence of noise produced inaccurate identification of the coupled frequency bands, due to the complete absence of oscillations at some frequencies. All analyses are based on the CA1 firing rates (top traces) during a representative simulation. Power spectral densities of these firing rates (left) indicate that some frequencies have a power of 0. PAC of the excitatory population was assessed using two graphical representations, the polar plot (middle) and comodulogram (right), and quantified using the MI. The comodulogram was calculated by computing the MI across 80% overlapping 1-Hz frequency bands in the theta range and across 90% overlapping 10-Hz frequency bands in the gamma range and subsequently plotted as a heat map. In the absence of noise, a slow theta frequency centered around 5 Hz is found to modulate a broad range of gamma frequencies between 40 and 100 Hz. The value indicated on the comodulogram indicates the average MI in the 3-9 Hz theta range and 40-80 Hz gamma range. As in Figure 2, the polar plot represents the amplitude of gamma oscillations (averaged across all theta cycles) at each phase of theta (theta range: 3-9 Hz, phase indicated as angular coordinate) and for different gamma frequencies (radial coordinate, binned in 1-Hz ranges). B. Adding uniform noise to the firing rate (with an amplitude ranging between 15 and 25% of the maximum firing rate) improved the identification of the coupled frequency bands. In this case, the slower theta frequency centered around 5 Hz modulates a gamma band located between 45 and 75 Hz.

      Reviewer #2:

      The main strength of this model is its use of a fairly physiologically detailed model of the hippocampus. The cells are single-compartment models but do include multiple ion channels and are spatially arranged in accordance with the hippocampal structure. This allows the understanding of how ion channels (possibly modifiable by pharmacological agents) interact with system-level oscillations and neurostimulation. The model also includes all the main hippocampal subfields. The other strength is its attention to an important topic, which may be relevant for dementia treatment or prevention, which few modeling studies have addressed. The work has several weaknesses.

      We thank the reviewer for appreciating our detailed description of the hippocampal formation and the focus on neurostimulation applications that aim at treating oscillopathies, especially dementia.

      1. First, while investigations of hippocampal neurostimulation are important there are few experimental studies from which one could judge the validity of the model findings. All its findings are therefore predictions. It would be much more convincing to first show the model is able to reproduce some measured empirical neurostimulation effect before proceeding to make predictions.

      We acknowledge that the results presented in Figures 4-7 of the revised manuscript cannot be compared to existing experimental data, and are therefore purely predictive. Future experimental work is needed to verify these predictions.

      Yet, we would also like to stress that the motivation behind this project was the inadequacy of previous models of theta-nested gamma oscillations (Onslow et al., 2014; Aussel et al., 2018; Segneri et al., 2020) to account for the mechanism of theta phase reset that occurs during electrical stimulation of the fornix or perforant path (Williams and Givens, 2003). Since we could not use these previous models to study the effects of neurostimulation on theta-nested gamma oscillations, we had to modify them to account for a dynamical theta input, which is the main methodological novelty that is reported in our manuscript (Figures 1 and 3 of the revised manuscript).

      Despite the scarcity of experimental studies that could confirm the full model, we sought to replicate a few experimental findings that employed optogenetic stimulation to induce gamma oscillations in individual excitatory-inhibitory circuits. Although not specific to the hippocampus, these studies have shown that gamma oscillations can be induced using either ramped photostimulation targeting excitatory neurons (Adesnik et al., 2010; Akam et al., 2012; Lu et al., 2015), or pulsed stimulation driving inhibitory cells in the gamma range (Cardin et al., 2009; Iaccarino et al., 2016). To account for these experimental results and replicate them, we have added 4 new figures (Figure 2 and its 3 figure supplements) and an extensive section in the results part (L.141-217 of the revised manuscript). The added section and related figures are indicated in our response to reviewer 1, comment 3 (p 2-7).

      2.1. Second, the model is very specific. Or if its behavior is to be considered general it has not been explained why.

      Although the spatial organization and cellular details of the model are indeed very specific, its general behavior, i.e., the production of theta-nested gamma oscillations and theta phase reset, are common to any excitatory-inhibitory circuit interconnected with Kuramoto oscillators. To illustrate this point, we have generalized our approach to the neural mass model developed by Onslow and colleagues (Onslow ACE, Jones MW, Bogacz R. A Canonical Circuit for Generating Phase-Amplitude Coupling. PLoS ONE. 2014 Aug; 9(8):e102591). These results are represented in a new supplementary figure (Figure3 – Figure Supplement 4), and briefly described in a new paragraph of the results section (L.262-268 of the revised manuscript):

      “Importantly, our approach is generalizable and can be applied to other models producing theta-nested gamma oscillations. For instance, we adapted the neural mass model by Onslow and colleagues (Onslow et al., 2014), replaced the fixed theta input by a set of Kuramoto oscillators, and demonstrated that it could also generate theta phase reset in response to single-pulse stimulation (Figure S8). These results illustrate that the general behavior of our model is not specific to the tuning of individual parameters in the conductancebased neurons, but follows general rules that are captured by the level of abstraction of the Kuramoto formalism.”

      Author response image 6.

      Figure S8 (Figure 3 – Figure Supplement 4). A neural mass model of coupled excitatory and inhibitory neurons driven by Kuramoto oscillators generates theta-nested gamma oscillations and theta phase reset. A. Two coupled neural masses (one excitatory and one inhibitory) driven by Kuramoto oscillators, which represent a dynamical oscillatory drive in the theta range, were used to implement a neural mass equivalent to our conductance-based model represented in Figure 1. Neural masses were modeled using the WilsonCowan formalism, with parameters adapted from Onslow et al. (2014) (𝑊𝐸𝐸 = 4.8, 𝑊𝐸𝐼 = 𝑊𝐼𝐸 = 4, 𝑊𝐼𝐼 = 0). B. The normalized population firing rates exhibit theta-nested gamma oscillations (middle and bottom panels) in response to the dynamic theta rhythm (top panel). A stimulation pulse delivered at the descending phase of the rhythm to both populations (marked by the inverted red triangle) produces a robust theta phase reset, similarly to Figure 3A.

      This simplified model is described in more details in the methods (L.694-710 of the revised manuscript). Additionally, the generation of gamma oscillations by individual excitatory-inhibitory circuits is now described in details in the added section “Generation of gamma oscillations by E-I circuits” (L.159-217 of the revised manuscript), which has already been discussed in our response to reviewer 1, comment 3 (p 2-7).

      2.2. For example, the model shows bistability between quiescence and TNGO, however what aspect of the model underlies this, be it some particular network structure or particular ion channel, for example, is not addressed.

      We thank the reviewer for mentioning this point, which we have now addressed. The “bistable” behavior that we reported occurs for values of the theta input that are just below the threshold to induce selfsustained theta-gamma oscillations (Figure 5 of the revised manuscript, point B). Moreover, the presence of the Calcium-Activated-Nonspecific (CAN) cationic channel, which is expressed by pyramidal neurons in the entorhinal cortex, CA3, and CA1 fields of the hippocampus, is necessary for this behavior to occur. Indeed, abolishing CAN channels in all areas of the model suppresses this behavior. We have now addressed this point in a new supplementary figure (Figure 5 – Figure Supplement 4) and a short description in the text (L.287-303 of the revised manuscript).

      “In the presence of dynamic theta input, the effects of single-pulse stimulation depended both on theta input amplitude and stimulation amplitude, highlighting different regimes of network activity (Figure 5 and Figure S9, Figure S10, Figure S11). For low theta input, theta-nested gamma oscillations were initially absent and could not be induced by stimulation (Figure 5A). At most, the stimulation could only elicit a few bursts of spiking activity that faded away after approximately 250 ms, similar to the rebound of activity seen in the absence of theta drive. For increasing theta input, the network switched to an intermediate regime: upon initialization at a state with no spiking activity, it could be kicked to a state with self-sustained theta-nested gamma oscillations by a single stimulation pulse of sufficiently high amplitude (Figure 5B). This regime existed for a range of septal theta inputs located just below the threshold to induce self-sustained theta-gamma oscillations without additional stimulation, as characterized by the post-stimulation theta power, gamma power, and theta-gamma PAC (Figure 5D). Removing CAN currents from all areas of the model abolished this behavior (Figure S12), which is interesting given the role of this current in the multistability of EC neurons (Egorov et al., 2002; Fransen et al., 2006) and in the intrinsic ability of the hippocampus to generate thetanested gamma oscillations (Giovannini et al., 2017). For the highest theta input, the network became able to spontaneously generate theta-nested gamma oscillations, even when initialized at a state with no spiking activity and without additional neurostimulation (Figure 5C).”

      Author response image 7.

      Figure S12 (Figure 5 – Figure Supplement 4). CAN currents are necessary for the production of selfsustained theta-gamma oscillations in response to single-pulse stimulation. A. Same as Figure 5B. B. Similar simulation as panel A., but without the presence of CAN currents in the EC, CA3 and CA1 fields of the hippocampus. Removing CAN currents from the model abolishes self-sustained theta-nested gamma oscillations in response to a single stimulation pulse (for the parameters represented in Figure 5, point B).

      Furthermore, we realized that the terminology “bistable” may not be justified as we could not perform a systematic bifurcation analysis, which is typically carried out in simpler neural mass models (e.g., Onslow et al., 2014; Segneri et al., 2020). Therefore, we decided to rephrase the sentences about “bistability” to keep a more general terminology. The following sentences were revised:

      L.20-23: “We showed that, for theta inputs just below the threshold to induce self-sustained theta-nested gamma oscillations, a single stimulation pulse could switch the network behavior from non-oscillatory to a state producing sustained oscillations.”

      L.305-309: “Based on the above analyses, we considered two pathological states: one with a moderate theta input (i.e., moderately weak projections from the medial septum to the EC) that allowed the initiation of selfsustained oscillations by single stimulation pulses (Figure 5, point B), and one with a weaker theta input characterized by the complete absence of self-sustained oscillations even following transient stimulation (Figure 5, point A).”

      L.316-317: “In the case of a moderate theta input and in the presence of phase reset, delivering a pulse at either the peak or trough of theta could induce theta-nested gamma oscillations (Figure 6A and 6C).”

      L.353-357: “A very interesting finding concerns the behavior of the model in response to single-pulse stimulation for certain values of the theta amplitude (Figure5). For low theta amplitudes, a single stimulation pulse was capable of switching the network behavior from a state with no spiking activity to one with prominent theta-nested gamma oscillations. Whether such an effect can be induced in vivo in the context of memory processes remains an open question.”

      2.3. Similarly for the various phase reset behaviors that are found.

      We would like to clarify the fact that the observed phase reset curves (reported in Figure 3D) are a direct consequence of the choice of an appropriate phase response function for the Kuramoto oscillators representing the medial septum. This choice is inspired by experimentally measured phase response curves from CA3 neurons. These aspects are described briefly in the introduction and in more details in the methods, as indicated below:

      L.101: “This new hybrid dynamical model could generate both theta-nested gamma oscillations and theta phase reset, following a particular phase response curve (PRC) inspired by experimental literature (Lengyel et al., 2005; Akam et al., 2012; Torben-Nielsen et al., 2010).”

      L.528-537: “Hereafter, we call the term 𝑍(𝜃) the phase response function, to distinguish it from the PRC obtained from experimental data or simulations (see section below "Data Analysis", "Phase Response Curve"). Briefly, the PRC of an oscillatory system indicates the phase delay or advancement that follows a single pulse, as a function of the phase at which this input is delivered. The phase response function 𝑍(𝜃) was chosen to mimic as well as possible experimental PRCs reported in the literature (Lengyel et al., 2005; Kwag and Paulsen, 2009; Akam et al., 2012). These PRCs appear biphasic and show a phase advancement (respectively delay) for stimuli delivered in the ascending (respectively descending) slope of theta. To accurately model this behavior, we used the following equation for the phase response function, where 𝜃𝑝𝑒𝑎𝑘 represents the phase at which the theta rhythm reaches its maximum and the parameter 𝜙𝑜𝑓𝑓𝑠𝑒𝑡 controls the desired phase offset from the peak:

      Author response image 8.

      On the figure below, we illustrate the phase response curves of CA3 neurons measured by Lengyel et al., 2005 (panel A.), and compare it with our simulated phase response curves (panel B.). Note that the conventions for phase advance and phase delay are reversed between the two panels.

      Finally, we would like to acknowledge that the model “is not derived from experimental phase response curves of septal neurons of which there is no direct measurement”, as mentioned by the reviewer in their comment 4 below. Despite the lack of experimental data specific to medial septum neurons, we argue that this phase response function is the only one that mathematically supports the generation of self-sustained theta-nested gamma oscillations in our current model. This statement is illustrated by Figure S7 (Figure 3 – Figure Supplement 3) and is mentioned in the results (L.249-261 of the revised manuscript):

      We modeled this behavior by a specific term (which we called the phase response function) in the general equation of the Kuramoto oscillators (see methods, Equation 1). Importantly, introducing a phase offset in the phase response function disrupted theta-nested gamma oscillations (Figure S7), which suggests that the septohippocampal circuitry must be critically tuned to be able to generate such oscillations. The strength of phase reset could also be adjusted by a gain that was manually tuned. In the presence of the physiological phase response function and of a sufficiently high reset gain, a single stimulation pulse delivered to all excitatory and inhibitory CA1 neurons could reset the phase of theta to a value close to its peaks (Figure 3A). We computed the PRC of our simulated data for different stimulation amplitudes and validated that our neuronal network behaved according to the phase response function set in our Kuramoto oscillators (Figure 3D). It should be noted that including this phase reset mechanism affected the generated theta rhythm even in the absence of stimulation, extending the duration of the theta peak and thereby slowing down the frequency of the generated theta rhythm.

      Author response image 9.

      Figure S7 (Figure 3 – Figure Supplement 3). Network behavior generated by Kuramoto oscillators with nonphysiological phase response functions. Each panel is similar to Figure 3A, but with a different offset added to the phase response function of the Kuramoto oscillators (see methods, Equation 4). The center frequency was set to 6 Hz in all of these simulations. Overall, theta oscillations in these cases are less sinusoidal and show more abrupt phase changes than in the physiological case. A. A phase offset of −𝜋∕2 leads to an overall theta oscillation of 4 Hz, with a second peak following the main theta peak. B. A phase offset of +𝜋∕2 reduces the peak of theta, resetting the rhythm to the middle of the ascending phase. C. A phase offset of 𝜋 or -𝜋 leads to the CA1 output resetting the theta rhythm to the trough of theta.

      2.4. We may wonder whether a different hippocampal model of TNGO, of which there are many published (for example [1-6]) would show the same effect under neurostimulation. This seems very unlikely […]

      [1] Hyafil A, Giraud AL, Fontolan L, Gutkin B. Neural cross-frequency coupling: connecting architectures, mechanisms, and functions. Trends in neurosciences. 2015 Nov 1;38(11):725-40.

      [2] Tort AB, Rotstein HG, Dugladze T, Gloveli T, Kopell NJ. On the formation of gamma-coherent cell assemblies by oriens lacunosum-moleculare interneurons in the hippocampus. Proceedings of the National Academy of Sciences. 2007 Aug 14;104(33):13490-5.

      [3] Neymotin SA, Lazarewicz MT, Sherif M, Contreras D, Finkel LH, Lytton WW. Ketamine disrupts theta modulation of gamma in a computer model of hippocampus. Journal of Neuroscience. 2011 Aug 10;31(32):11733-43.

      [4] Ponzi A, Dura-Bernal S, Migliore M. Theta-gamma phase-amplitude coupling in a hippocampal CA1 microcircuit. PLOS Computational Biology. 2023 Mar 23;19(3):e1010942.

      [5] Bezaire MJ, Raikov I, Burk K, Vyas D, Soltesz I. Interneuronal mechanisms of hippocampal theta oscillations in a full-scale model of the rodent CA1 circuit. Elife. 2016 Dec 23;5:e18566.

      [6] Chatzikalymniou AP, Gumus M, Skinner FK. Linking minimal and detailed models of CA1 microcircuits reveals how theta rhythms emerge and their frequencies controlled. Hippocampus. 2021 Sep;31(9):982-1002.

      The highlighted publications, while very important in their findings regarding theta-gamma phase-amplitude coupling, focused on specific subfields of the hippocampus. In our work, we aimed to develop a model that includes the different anatomical divisions of the hippocampal formation, while still exhibiting theta-nested gamma oscillations, which is why we decided to expand the model by Aussel et al. (2018). Exploring the behavior of all these different hippocampal models under neurostimulation is beyond the scope of the current manuscript.

      Nevertheless, we have added a new figure (Figure 3 – Figure Supplement 4) showing an adaptation of our modeling approach to a generic neural mass model of theta-nested gamma oscillations (Onslow et al., 2014), which illustrates the generalizability of our findings and is described in details in our response to comment 2.1. Moreover, we have further addressed the comments of the reviewers regarding bistability and phase response curves in our responses to comments 2.2 and 2.3.

      Furthermore, we have added references to all 6 of these publications in the revised version of the manuscript:

      L.43-50: Moreover, the modulation of gamma oscillations by the phase of theta oscillations in hippocampal circuits, a phenomenon termed theta-gamma phase-amplitude coupling (PAC), correlates with the efficacy of memory encoding and retrieval (Jensen and Colgin, 2007; Tort et al., 2009; Canolty and Knight, 2010; Axmacher et al., 2010; Fell and Axmacher, 2011; Lisman and Jensen, 2013; Lega et al., 2016). Experimental and computational work on the coupling between oscillatory rhythms has indicated that it originates from different neural architectures and correlates with a range of behavioral and cognitive functions, enabling the long-range synchronization of cortical areas and facilitating multi-item encoding in the context of memory (Hyafil et al., 2015)."

      L.415-426: “In terms of neuronal cell types, we also made an important simplification by considering only basket cells as the main class of inhibitory interneuron in the whole hippocampal formation. However, it should be noted that many other types of interneurons exist in the hippocampus and have been modeled in various works with higher computational complexity (e.g., Bezaire et al., 2016; Chatzikalymniou et al., 2021). Among these various interneurons, oriens-lacunosum moleculare (OLM) neurons in the CA1 field have been shown to play a crucial role in synchronizing the activity of pyramidal neurons at gamma frequencies (Tort et al., 2007), and in generating theta-gamma PAC (e.g., Neymotin et al., 2011; Ponzi et al., 2023). Additionally, these cells may contribute to the formation of specific phase relationships within CA1 neuronal populations, through the integration between inputs from the medial septum, the EC, and CA3 (Mysin et al., 2019). Future work is needed to include more diverse cell types and detailed morphologies modeled through multiple compartments.”

      2.5. […] and indeed the quiescent state itself shown by this model seems quite artificial.

      We would like to clarify the fact that the “quiescent state” mentioned by the reviewer is a simply a state where the theta input is too low to induce theta-nested gamma oscillations. In this regime, neurons are active only due to the noise term in the membrane potential, which was adjusted based on Figure S3 (Figure 2 – Figure Supplement 2, shown below), at the minimal level needed to disrupt artificial synchronization in decoupled populations. For an input of 0 nA, we acknowledge that this network is indeed fully quiescent (i.e., does not show any spiking activity). However, as soon as the input increases, spontaneous spiking activity starts to appear with an average firing rate that depends on the input amplitude and is characterized by the input-frequency curves (panel A.). Please note that adding more noise could eliminate the observed quiescence in the absence of any input, but that it would not affect qualitatively the reported results.

      Author response image 10.

      Figure S3 (Figure 2 – Supplement 2). Cell-intrinsic spiking activity in decoupled excitatory and inhibitory populations under ramping input. A. Input-Frequency (I-F) curves for excitatory cells (left panel; pyramidal neurons with ICAN) and inhibitory cells (right panel; interneurons, fast-spiking) used in the model. Above a certain tonic input (around 0.35 nA for excitatory and 0.1 nA for inhibitory neurons), neurons can spike in the gamma range. B. Raster plot showing the spiking activity of excitatory (blue, NE = 1000) and inhibitory (red, NI = 100) neurons in decoupled populations under ramping input (top trace) and in the absence of noise in the membrane potential. Despite random initial conditions across neurons, oscillations emerge in both populations due to the intrinsic properties of the cells, with a frequency that is predicted by the respective IF curves (panel A.). C. Similar representation as panel B. but with the addition of stochastic noise in the membrane potential of each neuron. The presence of noise disrupts the emergence of oscillations in these decoupled populations.

      2.6. Some indication that particular ion channels, CAN and M are relevant is briefly provided and the work would be much improved by examining this aspect in more detail.

      We thank the reviewer for acknowledging the importance of these ion channels. We have now added a new supplementary figure (Figure 5 – Figure Supplement 4), which is described in more details in our response to comment 2.2 and illustrates the role of the CAN current in the generation of theta-nested gamma oscillations following a single stimulation pulse. Moreover, we would like to stress that the impact of CAN currents in the ability of the hippocampus to generate theta-nested gamma oscillations intrinsically, i.e., in the absence of persistent external input, has already been investigated in details by a previous computational study cited in our manuscript (Giovannini F, Knauer B, Yoshida M, Buhry L. The CAN-In network: A biologically inspired model for self-sustained theta oscillations and memory maintenance in the hippocampus. Hippocampus. 2017 Apr;809 27(4):450–463).

      2.7. In summary, the work would benefit from an intuitive analysis of the basic model ingredients underlying its neurostimulation response properties.

      We thank the reviewer for this suggestion. By addressing the reviewer’s previous comments (reviewer 2, comments 2.1 and 2.2), which overlap partly with the first reviewer (reviewer 1, comment 3), we believe we have improved the manuscript and have provided key information related to the way the model responds to neurostimulation.

      3..) Third, while the model is fairly realistic, considerable important factors are not included and in fact, there are much more detailed hippocampal models out there (for example [5,6]). In particular, it includes only excitatory cells and a single type of inhibitory cell. This is particularly important since there are many models and experimental studies where specific cell types, for example, OLM and VIP cells, are strongly implicated in TNGO.

      [5] Bezaire MJ, Raikov I, Burk K, Vyas D, Soltesz I. Interneuronal mechanisms of hippocampal theta oscillations in a full-scale model of the rodent CA1 circuit. Elife. 2016 Dec 23;5:e18566.

      [6] Chatzikalymniou AP, Gumus M, Skinner FK. Linking minimal and detailed models of CA1 microcircuits reveals how theta rhythms emerge and their frequencies controlled. Hippocampus. 2021 Sep;31(9):982-1002.

      We thank the reviewer for pointing out these interesting avenues for future studies. As indicated in previous responses (reviewer 1, comment 1; reviewer 2, comment 2.4), we have added several paragraphs to discuss these limitations, the rationale behind our simplifications, and potential improvements. In particular, we have added the following paragraphs to discuss our simplifications in terms of connectivity and cell types:

      Anatomical connectivity:

      L.141-150: “Biologically, GABAergic neurons from the medial septum project to the EC, CA3, and CA1 fields of the hippocampus (Toth et al., 1993; Hajós et al., 2004; Manseau et al., 2008; Hangya et al., 2009; Unal et al., 2015; Müller and Remy, 2018). Although the respective roles of these different projections are not fully understood, previous computational studies have suggested that the direct projection from the medial septum to CA1 is not essential for the production of theta in CA1 microcircuits (Mysin et al., 2019). Since our modeling of the medial septum is only used to generate a dynamic theta rhythm, we opted for a simplified representation where the medial septum projects only to the EC, which in turn drives the different subfields of the hippocampus. In our model, Kuramoto oscillators are therefore connected to the EC neurons and they receive projections from CA1 neurons (see methods for more details).”

      Cell types:

      L.415-426: “In terms of neuronal cell types, we also made an important simplification by considering only basket cells as the main class of inhibitory interneuron in the whole hippocampal formation. However, it should be noted that many other types of interneurons exist in the hippocampus and have been modeled in various works with higher computational complexity (e.g., Bezaire et al., 2016; Chatzikalymniou et al., 2021). Among these various interneurons, oriens-lacunosum moleculare (OLM) neurons in the CA1 field have been shown to play a crucial role in synchronizing the activity of pyramidal neurons at gamma frequencies (Tort et al., 2007), and in generating theta-gamma PAC (e.g., Neymotin et al., 2011; Ponzi et al., 2023). Additionally, these cells may contribute to the formation of specific phase relationships within CA1 neuronal populations, through the integration between inputs from the medial septum, the EC, and CA3 (Mysin et al., 2019). Future work is needed to include more diverse cell types and detailed morphologies modeled through multiple compartments.”

      3.2. Other missing ingredients one may think might have a strong impact on model response to neurostimulation (in particular stimulation trains) include the well-known short-term plasticity between different hippocampal cell types and active dendritic properties.

      We agree with the reviewer that plasticity mechanisms are important to include in future work, which we had already mentioned in the limitations section of the manuscript:

      L.436-443: “Importantly, we did not consider learning through synaptic plasticity, even though such mechanisms could drastically modify synaptic conduction for the whole network (Borges et al., 2017). Even more interestingly, the inclusion of spike-timing-dependent plasticity would enable the investigation of stimulation protocols aimed at promoting LTP, such as theta-burst stimulation (Larson et al., 2015). This aspect would be of uttermost importance to make a link with memory encoding and retrieval processes (Axmacher et al., 2006; Tsanov et al., 2009; Jutras et al., 2013) and with neurostimulation studies for memory improvement (Titiz et al., 2017; Solomon et al., 2021).”

      1. Fourth the MS model seems somewhat unsupported. It is modeled as a set of coupled oscillators that synchronize. However, there is also a phase reset mechanism included. This mechanism is important because it underlies several of the phase reset behaviors shown by the full model. However, it is not derived from experimental phase response curves of septal neurons of which there is no direct measurement. The work would benefit from the use of a more biologically validated MS model.

      We would like to confirm that the phase reset mechanism is indeed at the core of using Kuramoto oscillators to model a particular system. For more details about our choice of a phase response function and the obtained results in terms of phase response curves, we refer the reader to our response to comment 2.3.

      Generally speaking, we chose to use Kuramoto oscillators as it is the simplest model that can provide an oscillatory input to another system while including a phase reset mechanism. This set of oscillators was used to replace the fixed sinusoidal wave that represented theta inputs in previous models (Onslow et al., 2014; Aussel et al., 2018; Segneri et al., 2020). Kuramoto oscillators are a well-established model of synchronization in various fields of physics. They have also been used in neuroscience to model the phase reset of collective rhythms (Levnajić et al. 2010), and the effects of DBS on the basal ganglia network in Parkinson’s disease (Tass et al. 2003, Ebert et al. 2014, Weerasinghe et al. 2019).

      More detailed models of the medial septum exist in the literature (e.g., Wang et al. 2002, Hajós et al. 2004) and model the GABAergic effects of the septal projections onto the hippocampal formation. However, it is not trivial to infer the connectivity parameters and the degree of innervation between the hippocampus and the medial septum. Furthermore, the claims made in our study do not necessarily depend on the nature of the projections between the two areas. Therefore, we decided to represent the medial septum in a conceptual way and focus mostly on the effects of these projections rather than replicating them in detail.

      Aussel, Amélie, Laure Buhry, Louise Tyvaert, and Radu Ranta. “A Detailed Anatomical and Mathematical Model of the Hippocampal Formation for the Generation of Sharp-Wave Ripples and Theta-Nested Gamma Oscillations.” Journal of Computational Neuroscience 45, no. 3 (December 2018): 207–21. https://doi.org/10.1007/s10827-018-0704-x.

      Ebert, Martin, Christian Hauptmann, and Peter A. Tass. “Coordinated Reset Stimulation in a Large-Scale Model of the STN-GPe Circuit.” Frontiers in Computational Neuroscience 8 (2014): 154. https://doi.org/10.3389/fncom.2014.00154.

      Hajós, M., W.E. Hoffmann, G. Orbán, T. Kiss, and P. Érdi. “Modulation of Septo-Hippocampal θ Activity by GABAA Receptors: An Experimental and Computational Approach.” Neuroscience 126, no. 3 (January 2004): 599–610. https://doi.org/10.1016/j.neuroscience.2004.03.043.

      Levnajić, Zoran, and Arkady Pikovsky. “Phase Resetting of Collective Rhythm in Ensembles of Oscillators.” Physical Review E 82, no. 5 (November 3, 2010): 056202. https://doi.org/10.1103/PhysRevE.82.056202.

      Onslow, Angela C. E., Matthew W. Jones, and Rafal Bogacz. “A Canonical Circuit for Generating PhaseAmplitude Coupling.” Edited by Adriano B. L. Tort. PLoS ONE 9, no. 8 (August 19, 2014): e102591. https://doi.org/10.1371/journal.pone.0102591.

      Segneri, Marco, Hongjie Bi, Simona Olmi, and Alessandro Torcini. “Theta-Nested Gamma Oscillations in Next Generation Neural Mass Models.” Frontiers in Computational Neuroscience 14 (2020). https://doi.org/10.3389/fncom.2020.00047. T ass, Peter A. “A Model of Desynchronizing Deep Brain Stimulation with a Demand-Controlled Coordinated Reset of Neural Subpopulations.” Biological Cybernetics 89, no. 2 (August 1, 2003): 81–88. https://doi.org/10.1007/s00422-003-0425-7.

      Wang, Xiao-Jing. “Pacemaker Neurons for the Theta Rhythm and Their Synchronization in the Septohippocampal Reciprocal Loop.” Journal of Neurophysiology 87, no. 2 (February 1, 2002): 889–900. https://doi.org/10.1152/jn.00135.2001.

      Weerasinghe, Gihan, Benoit Duchet, Hayriye Cagnan, Peter Brown, Christian Bick, and Rafal Bogacz. “Predicting the Effects of Deep Brain Stimulation Using a Reduced Coupled Oscillator Model.” PLoS Computational Biology 15, no. 8 (August 8, 2019): e1006575. https://doi.org/10.1371/journal.pcbi.1006575.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript by Wagstyl et al. describes an extensive analysis of gene expression in the human cerebral cortex and the association with a large variety of maps capturing many of its microscopic and macroscopic properties. The core methodological contribution is the computation of continuous maps of gene expression for >20k genes, which are being shared with the community. The manuscript is a demonstration of several ways in which these maps can be used to relate gene expression with histological features of the human cortex, cytoarchitecture, folding, function, development and disease risk. The main scientific contribution is to provide data and tools to help substantiate the idea of the genetic regulation of multi-scale aspects of the organisation of the human brain. The manuscript is dense, but clearly written and beautifully illustrated.

      Main comments

      The starting point for the manuscript is the construction of continuous maps of gene expression for most human genes. These maps are based on the microarray data from 6 left human brain hemispheres made available by the Allen Brain Institute. By technological necessity, the microarray data is very sparse: only 1304 samples to map all the cortex after all subjects were combined (a single individual's hemisphere has ~400 samples). Sampling is also inhomogeneous due to the coronal slicing of the tissue. To obtain continuous maps on a mesh, the authors filled the gaps using nearest-neighbour interpolation followed by strong smoothing. This may have two potentially important consequences that the authors may want to discuss further: (a) the intrinsic geometry of the mesh used for smoothing will introduce structure in the expression map, and (b) strong smoothing will produce substantial, spatially heterogeneous, autocorrelations in the signal, which are known to lead to a significant increase in the false positive rate (FPR) in the spin tests they used.

      Many thanks to the reviewer for their considered feedback. We have addressed these primary concerns into point-by-point responses below. The key conclusions from our new analyses are: (i) while the intrinsic geometry of the mesh had not originally been accounted for in sufficient detail, the findings presented in this manuscript paper are not driven by mesh-induced structure, (ii) that the spin test null models used in this manuscript [(including a modified version introduced in response to (i)] are currently the most appropriate way to mitigate against inflated false positive rates when making statistical inferences on smooth, surface-based data.

      a. Structured smoothing

      A brain surface has intrinsic curvature (Gaussian curvature, which cannot be flattened away without tearing). The size of the neighbourhood around each surface vertex will be determined by this curvature. During surface smoothing, this will make that the weight of each vertex will be also modulated by the local curvature, i.e., by large geometric structures such as poles, fissures and folds. The article by Ciantar et al (2022, https://doi.org/10.1007/s00429-022-02536-4) provides a clear illustration of this effect: even the mapping of a volume of pure noise into a brain mesh will produce a pattern over the surface strikingly similar to that obtained by mapping resting state functional data or functional data related to a motor task.

      Comment 1

      It may be important to make the readers aware of this possible limitation, which is in large part a consequence of the sparsity of the microarray sampling and the necessity to map that to a mesh. This may confound the assessments of reproducibility (results, p4). Reproducibility was assessed by comparing pairs of subgroups split from the total 6. But if the mesh is introducing structure into the data, and if the same mesh was used for both groups, then what's being reproduced could be a combination of signal from the expression data and signal induced by the mesh structure.

      Response 1

      The reviewer raises an important question regarding the potential for interpolation and smoothing on a cortical mesh to induce a common/correlated signal due to the intrinsic mesh structure. We have now generated a new null model to test this idea which indicates that intrinsic mesh structure is not inflating reproducibility in interpolated expression maps. This new null model spins the original samples prior to interpolation, smoothing and comparison between triplet splits of the six donors, with independent spins shared across the triplet. For computational tractability we took one pair of triplets and regenerated the dataset for each triplet using 10 independent spins. We used these to estimate gene-gene null reproducibility for 90 independent pairwise combinations of these 10 spins. Across these 90 permutations, the average median gene-gene correlation was R=0.03, whereas in the unspun triplet comparisons this was R=0.36. These results indicate that the primary source of the gene-level triplet reproducibility is the underlying shared gene expression pattern rather than interpolation-induced structure.

      In Methods 2a: "An additional null dataset was generated to test whether intrinsic geometry of the cortical mesh and its impact on interpolation for benchmarking analyses of DEMs and gradients (Fig S1d, Fig S2d, Fig S3c). In these analyses, the original samples were rotated on the spherical surface prior to subsequent interpolation, smoothing and gradient calculation. Due to computational constraints the full dataset was recreated only for 10 independent spins. These are referred to as the “spun+interpolated null”.

      Author response image 1.

      Figure S1d, Gene predictability was higher across all triplet-triplet pairs than when compared to spun+interpolated null.

      Comment 2

      It's also possible that mesh-induced structure is responsible in part for the "signal boost" observed when comparing raw expression data and interpolated data (fig S1a). How do you explain the signal boost of the smooth data compared with the raw data otherwise?

      Response 2

      We thank the reviewer for highlighting this issue of mesh-induced structure. We first sought to quantify the impact of mesh-induced structure through the new null model, in which the data are spun prior to interpolation. New figure S1d, S2d and S3c all show that the main findings are not driven by interpolation over a common mesh structure, but rather originate in the underlying expression data.

      Specifically, for the original Figure S1a, the reviewer highlights a limitation that we compared intersubject predictability of raw-sample to raw-sample and interpolated-to-interpolated. In this original formulation improved prediction scores for interpolated-to-interpolated (the “signal boost”) could be driven by mesh-induced structure being applied to both the input and predicted maps. We have updated this so that we are now comparing raw-to-raw and interpolated-to-raw, i.e. whether interpolated values are better estimations of the measured expression values. The new Fig S1a&b (see below) shows a signal boost in gene-level and vertex level prediction scores (delta R = +0.05) and we attribute this to the minimisation of location and measurement noise in the raw data, improving the intersubject predictability of expression levels.

      In Methods 2b: "To assess the effect of data interpolation in DEM generation we compared gene-level and vertex-level reproducibility of DEMs against a “ground truth” estimate of these reproducibility metrics based on uninterpolated expression data. To achieve a strict comparison of gene expression values between different individuals at identical spatial locations we focused these analyses on the subset of AHBA samples where a sample from one subject was within 3 mm geodesic distance of another. This resulted in 1097 instances (spatial locations) with measures of raw gene expression of one donor, and predicted values from the second donor’s un-interpolated AHBA expression data and interpolated DEM. We computed gene-level and vertex-level reproducibility of expression using the paired donor data at each of these sample points for both DEM and uninterpolated AHBA expression values. By comparing DEM reproducibility estimates with those for uninterpolated AHBA expression data, we were able to quantify the combined effect of interpolation and smoothing steps in DEM generation. We used gene-level reproducibility values from DEMs and uninterpolated AHBA expression data to compute a gene-level difference in reproducibility, and we then visualized the distribution of these difference values across genes (Fig S1a). We used gene-rank correlation to compare vertex-level reproducibility values between DEMs and uninterpolated AHBA expression data (Fig S1b)."

      Author response image 2.

      Figure S1. Reproducibility of Dense Expression Maps (DEMs) interpolated from spatially sparse postmortem measures of cortical gene expression. a, Signal boost in the interpolated DEM dataset vs. spatially sparse expression data. Restricting to samples taken from approximately the same cortical location in pairs of individuals (within 3mm geodesic distance), there was an overall improvement in intersubject spatial predictability in the interpolated maps. Furthermore, genes with lower predictability in the interpolated maps were less predictable in the raw dataset, suggesting these regions exhibit higher underlying biological variability rather than methodologically introduced bias. b, Similarly at the paired sample locations, gene-rank predictability was generally improved in DEMs vs. sparse expression data (median change in R from sparse samples to interpolated for each pair of subjects, +0.5).

      1. How do you explain that despite the difference in absolute value the combined expression maps of genes with and without cortical expression look similar? (fig S1e: in both cases there's high values in the dorsal part of the central sulcus, in the occipital pole, in the temporal pole, and low values in the precuneus and close to the angular gyrus). Could this also reflect mesh-smoothing-induced structure?

      Response 3

      As with comment 1, this is an interesting perspective that we had not fully considered. We would first like to clarify that non-cortical expression is defined from the independent datasets including the “cortex” tissue class of the human protein atlas and genes identified as markers for cortical layers or cortical cells in previous studies. This is still likely an underestimate of true cortically expressed genes as some of these “non-cortical genes” had high intersubject reproducibility scores. Nevertheless we think it appropriate to use a measure of brain expression independent of anything included in other analyses for this paper. These considerations are part of the reason we provide all gene maps with accompanying uncertainty scores for user discretion rather than simply filtering them out.

      In terms of the spatially consistent pattern of the gene ranks of Fig S1f, this consistent spatial pattern mirrors Transcriptomic Distinctiveness (r=0.52 for non-cortical genes, r=0.75 for cortical genes), so we think that as the differences in expression signatures become more extreme, the relative ranks of genes in that region are more reproducible/easier to predict.

      To assess whether mesh-smoothing-induced structure is playing a role, we carried out an additional the new null model introduced in response to comment 1, and asked if the per-vertex gene rank reproducibility of independently spun subgroup triplets showed a similar structure to that in our original analyses. Across the 90 permutations, the median correlation between vertex reproducibility and TD was R=0.10. We also recalculated the TD maps for the 10 spun datasets and the mean correlation with the original TD did not significantly differ from zero (mean R = 0.01, p=0.2, nspins =10). These results indicate that folding morphology is not the major driver of local or large scale patterning in the dataset. We have included this as a new Figure S3c.

      We have updated the text as follows:

      In Methods 3a: "Third, to assess whether the covariance in spatial patterning across genes could be a result of mesh-associated structure introduced through interpolation and smoothing, TD maps were recomputed for the spun+interpolated null datasets and compared to the original TD map (Fig S3c)."

      In Results: "The TD map observed from the full DEMs library was highly stable between all disjoint triplets of donors (Methods, Fig S3a, median cross-vertex correlation in TD scores between triplets r=0.77) and across library subsets at all deciles of DEM reproducibility (Methods, Fig S3b, cross-vertex correlation in TD scores r>0.8 for the 3rd-10th deciles), but was not recapitulated in spun null datasets (Fig S3c)."

      Author response image 3.

      Figure S3c, Correlations between TD and TD maps regenerated on datasets spun using two independent nulls, one where the rotation is applied prior to interpolation and smoothing (spun+interpolated) and one where it is applied to the already-created DEMs. In each null, the same rotation matrix is applied to all genes.

      Comment 4

      Could you provide more information about the way in which the nearest-neighbours were identified (results p4). Were they nearest in Euclidean space? Geodesic? If geodesic, geodesic over the native brain surface? over the spherically deformed brain? (Methods cite Moresi & Mather's Stripy toolbox, which seems to be meant to be used on spheres). If the distance was geodesic over the sphere, could the distortions introduced by mapping (due to brain anatomy) influence the geometry of the expression maps?

      Response 4

      We have clarified in the Methods that the mapping is to nearest neighbors on the spherically-inflated surface.

      The new null model we have introduced in response to comments 1 & 3 preserves any mesh-induced structure alongside any smoothing-induced spatial autocorrelations, and the additional analyses above indicate that main results are not induced by systematic mesh-related interpolation signal. In response to an additional suggestion from the reviewer (Comment 13), we also assessed whether local distortions due to the mesh could be creating apparent border effects in the data, for instance at the V1-V2 boundary. At the V1-V2 border, which coincides anatomically with the calcarine sulcus, we computed the 10 genes with the highest expression gradient along this boundary in the actual dataset and the spun-interpolated null. The median test expression gradients along this border was higher than in any of the spun datasets, indicating that these boundary effects are not explained by the interpolation and cortical geometry effects on the data (new Fig S2d). The text has been updated as follows:

      In Methods 1: "For cortical vertices with no directly sampled expression, expression values were interpolated from their nearest sampled neighbor vertex on the spherical surface (Moresi and Mather, 2019) (Fig 1b)."

      In Methods 2: "We used the spun+interpolated null to test whether high gene gradients could be driven by non-uniform interpolation across cortical folds. We quantified the average gradient for all genes along the V1-V2 border in the atlas, as well as for 10 iterations of the atlas where the samples were spun prior to interpolation. We computed the median gradient magnitude for the 20 top-ranked genes for each (Fig S2d)."

      Author response image 4.

      Figure S2d Mean of gradient magnitudes for 20 genes with largest gradients along V1-V2 border, compared to values along the same boundary on the spun+interpolated null atlas. Gradients were higher in the actual dataset than in all spun version indicating this high gradient feature is not primarily due to the effects of calcarine sulcus morphology on interpolation

      Comment 5

      Could you provide more information about the smoothing algorithm? Volumetric, geodesic over the native mesh, geodesic over the sphere, averaging of values in neighbouring vertices, cotangent-weighted laplacian smoothing, something else?

      Response 5

      We are using surface-based geodesic over the white surface smoothing described in Glasser et al., 2013 and used in the HCP workbench toolbox (https://www.humanconnectome.org/software/connectome-workbench). We have updated the methods to clarify this.

      In Methods 1: "Surface expression maps were smoothed using the Connectome Workbench toolbox (Glasser et al. 2013) with a 20mm full-width at half maximum Gaussian kernel , selected to be consistent with this sampling density (Fig 1c)."

      Comment 6

      Could you provide more information about the method used for computing the gradient of the expression maps (p6)? The gradient and the laplacian operator are related (the laplacian is the divergence of the gradient), which could also be responsible in part for the relationships observed between expression transitions and brain geometry.

      Response 6

      We are using Connectome Workbench’s metric gradient command for this Glasser et al., 2013 and used in the HCP workbench pipeline. The source code for gradient calculation can be found here: https://github.com/Washington-University/workbench/blob/131e84f7b885d82af76e be21adf2fa97795e2484/src/Algorithms/AlgorithmMetricGradient.cxx

      In Methods 2: >For each of the resulting 20,781 gene-level expression maps, the orientation and magnitude of gene expression change at each vertex (i.e. the gradient) was calculated for folded, inflated, spherical and flattened mesh representations of the cortical sheet using Connectome Workbench’s metric gradient command (Glasser et al. 2013).

      b. Potentially inflated FPR for spin tests on autocorrelated data."

      Spin tests are extensively used in this work and it would be useful to make the readers aware of their limitations, which may confound some of the results presented. Spin tests aim at establishing if two brain maps are similar by comparing a measure of their similarity over a spherical deformation of the brains against a distribution of similarities obtained by randomly spinning one of the spheres. It is not clear which specific variety of spin test was used, but the original spin test has well known limitations, such as the violation of the assumption of spatial stationarity of the covariance structure (not all positions of the spinning sphere are equivalent, some are contracted, some are expanded), or the treatment of the medial wall (a big hole with no data is introduced when hemispheres are isolated).

      Another important limitation results from the comparison of maps showing autocorrelation. This problem has been extensively described by Markello & Misic (2021). The strong smoothing used to make a continuous map out of just ~1300 samples introduces large, geometry dependent autocorrelations. Indeed, the expression maps presented in the manuscript look similar to those with the highest degree of autocorrelation studied by Markello & Misic (alpha=3). In this case, naive permutations should lead to a false positive rate ~46% when comparing pairs of random maps, and even most sophisticated methods have FPR>10%.

      Comment 7 There's currently several researchers working on testing spatial similarity, and the readers would benefit from being made aware of the problem of the spin test and potential solutions. There's also packages providing alternative implementations of spin tests, such as BrainSMASH and BrainSpace, which could be mentioned.

      Response 7

      We thank the reviewer for raising the issue of null models. First, with reference to the false positive rate of 46% when maps exhibit spatial autocorrelation, we absolutely agree that this is an issue that must be accounted for and we address this using the spin test. We acknowledge there has been other work on nulls such as BrainSMASH and BrainSpace. Nevertheless in the Markello and Misic paper to which the reviewer refers, the BrainSmash null models perform worse with smoother maps (with false positive rates approaching 30% in panel e below), whereas the spin test maintains false positives rates below 10%.

      Author response image 5.

      We have added a brief description of the challenge and our use of the spin test.

      In Methods 2a: "Cortical maps exhibit spatial autocorrelation that can inflate the False Positive Rate, for which a number of methods have been proposed(Alexander-Bloch et al. 2018; Burt et al. 2020; Vos de Wael et al. 2020). At higher degrees of spatial smoothness, this high False Positive Rate is most effectively mitigated using the spin test(Alexander-Bloch et al. 2018; Markello and Misic 2021; Vos de Wael et al. 2020). In the following analyses when generating a test statistic comparing two spatial maps, to generate a null distribution, we computed 1000 independent spins of the cortical surface using https://netneurotools.readthedocs.io, and applied it to the first map whilst keeping the second map unchanged. The test statistic was then recomputed 1000 times to generate a null distribution for values one might observe by chance if the maps shared no common organizational features. This is referred to throughout as the “spin test” and the derived p-values as pspin."

      Comment 8

      Could it be possible to measure the degree of spatial autocorrelation?

      Response 8

      We agree this could be a useful metric to generate for spatial cortical maps. However, there are multiple potential metrics to choose from and each of the DEMs would have their own value. To address this properly would require the creation of a set of validated tools and it is not clear how we could summarize this variety of potential metrics for 20k genes. Moreover, as discussed above the spin method is an adequate null across a range of spatial autocorrelation degrees, thus while we agree that in general estimation of spatial smoothness could be a useful imaging metric to report, we consider that it is beyond the scope of the current manuscript.

      Comment 9

      Could you clarify which version of the spin test was used? Does the implementation come from a package or was it coded from scratch?

      Response 9

      As Markello & Misic note, at the vertex level, the various implementations of the spin test become roughly equivalent to the ‘original’ Alexander-Bloch et al., implementation. We used took the code for the ‘original’ version implemented in python here: https://netneurotools.readthedocs.io/en/latest/_modules/netneurotools/stats.html# gen_spinsamples.

      This has been updated in the methods (see Response 7).

      Comment 10

      Cortex and non-cortex vertex-level gene rank predictability maps (fig S1e) are strikingly similar. Would the spin test come up statistically significant? What would be the meaning of that, if the cortical map of genes not expressed in the cortex appeared to be statistically significantly similar to that of genes expressed in the cortex?

      Response 10

      Please see response to comment 3, which also addresses this observation.

      Reviewer #2 (Public Review):

      The authors convert the AHBA dataset into a dense cortical map and conduct an impressively large number of analyses demonstrating the value of having such data.

      I only have comments on the methodology.

      Comment 1

      First, the authors create dense maps by simply using nearest neighbour interpolation followed by smoothing. Since one of the main points of the paper is the use of a dense map, I find it quite light in assessing the validity of this dense map. The reproducibility values they calculate by taking subsets of subjects are hugely under-powered, given that there are only 6 brains, and they don't inform on local, vertex-wise uncertainties). I wonder if the authors would consider using Gaussian process interpolation. It is really tailored to this kind of problem and can give local estimates of uncertainty in the interpolated values. For hyperparameter tuning, they could use leave-one-brain-out for that.

      I know it is a lot to ask to change the base method, as that means re-doing all the analyses. But I think it would strengthen the paper if the authors put as much effort in the dense mapping as they did in their downstream analyses of the data.

      Response 1

      We thank the reviewer for the suggestion to explore Gaussian process interpolation. We have implemented this for our dataset and attempted to compare this with our original method with the 3 following tests: i) intertriplet reproducibility of individual gene maps, ii) microscale validations: area markers, iii) macroscale validations: bio patterns.

      Overall, compared to our original nearest-neighbor interpolation method, GP regression (i) did not substantially improve gene-level reproducibility of expression maps (median correlation increase of R=0.07 which was greater for genes without documented protein expression in cortex): ii) substantially worsened performance in predicting areal marker genes and iii) showed similar but slightly worse performance at predicting macroscale patterns from Figure 1.

      Given the significantly poorer performance on one of our key tests (ii) we have opted not to replace our original database, but we do now include code for the alternative GP regression methodology in the github repository so others can reproduce/further develop these methods.

      Author response image 6.

      ii) Genes ranked by mean expression gradient from current DEMs (left) and Gaussian process-derived interpolation maps (right). Established Human and macaque markers are consistently higher-ranked in DEM maps. iii) Figure 1 Interpolated vs GP regression

      Author response table 1.

      Comment 2

      It is nice that the authors share some code and a notebook, but I think it is rather light. It would be good if the code was better documented, and if the user could have access to the non-smoothed data, in case they was to produce their own dense maps. I was only wondering why the authors didn't share the code that reproduces the many analyses/results in the paper.

      Response 2

      We thank the reviewer for this suggestion. In response we have updated the shared github repository (https://github.com/kwagstyl/magicc). This now includes code and notebooks to reproduce the main analyses and figures.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      Comment 11

      p4 mentions Fig S1h, but the supp figures only goes from S1a to S1g

      Response 11

      We thank the reviewer for capturing this error. It was in fact referring to what is now Fig S1h and has been updated.

      Comment 12

      It would be important that the authors share all the code used to produce the results in the paper in addition to the maps. The core methodological contribution of the work is a series of continuous maps of gene expression, which could become an important tool for annotation in neuroimaging research. Many arbitrary (reasonable) decisions were made, it would be important to enable users to evaluate their influence on the results.

      Response 12

      We thank both reviewers for this suggestion. We have updated the github to be able to reproduce the dense maps and key figures with our methods.

      Comment 13

      p5: Could the sharp border reflect the effect of the geometry of the calcarine sulcus on map smoothing? More generally, could there be an effect of folds on TD?

      Response 13

      Please see our response to Reviewer 1, Comment 1 above, where we introduce the new null models now analyzed to test for effects of mesh geometry on our findings. These new null models - where original source data were spun prior to interpolation suggest that neither the sharp V1/2 border or the TD map are effects of mesh geometry. Specifically: (i) , the magnitudes of gradients along the V1/2 boundary from null models were notably smaller than those in our original analyses (see new figure S2d), and (ii) TD maps computed from the new null models showed no correlation with TD maps from ur original analyses (new Figure S3c, mean R = 0.01, p=0.2, nspins =10).

      Comment 14

      p5: Similar for the matching with the areas in Glasser's parcellation: the definition of these areas involves alignment through folds (based on freesurfer 'sulc' map, see Glasser et al 2016). If folds influence the geometry of TDs, could that influence the match?

      Response 14

      We note that Fig S3c provided evidence that folding was not the primary driver of the TD patterning. However, it is true that Glasser et al. use both neuroanatomy (folding, thickness and myelin) and fMRI-derived maps to delineate their cortical areas. As such Figure 2 f & g aren’t fully independent assessments. Nevertheless the reason that these features are used is that many of the sulci in question have been shown to reliably delineate cytoarchitectonic boundaries (Fischl et al., 2008).

      In Results: "A similar alignment was seen when comparing gradients of transcriptional change with the spatial orientation of putative cortical areas defined by multimodal functional and structural in vivo neuroimaging(Glasser et al., 2016) (expression change running perpendicular to area long-axis, pspin<0.01, Fig 2g, Methods)."

      Comment 15

      p6: TD peaks are said to overlap with functionally-specialised regions. A comment on why audition is not there, nor language, but ba 9-46d is? Would that suggest a lesser genetic regulation of those functions?

      Response 15

      The reviewer raises a valid point and this was a result that we were also surprised by. The finding that the auditory cortex is not as microstructurally distinctive as, say V1, is consistent with other studies applying dimensionality-reduction techniques to multimodal microstructural receptor data (e.g. Zilles et al., 2017, Goulas et al., 2020). These studies found that the auditory microstructure is not as extreme as either visual and somatomotor areas. From a methodological view point, the primary auditory cortex is significantly smaller than both visual and somatomotor areas, and therefore is captured by fewer independent samples, which could reduce the detail in which its structure is being mapped in our dataset.

      For the frontal areas, we would note that i) the frontal peak is the smallest of all peaks found and was more strongly characterised by low z-score genes than high z-score. ii) the anatomical areas in the frontal cortex are much more highly variable with respect to folding morphology (e.g. Rajkowska 1995). The anatomical label of ba9-46d (and indeed all other labels) were automatically generated as localisers rather than strict area labels. We have clarified this in the text as follows:

      In Methods 3a: "Automated labels to localize TD peaks were generated based on their intersection with a reference multimodal neuroimaging parcellation of the human cortex(Glasser et al., 2016). Each TD was given the label of the multimodal parcel that showed greatest overlap (Fig 2b)."

      Comment 16.

      p7: The proposition that "there is a tendency for cortical sulci to run perpendicular to the direction of fastest transcriptional change", could also be "there is a tendency for the direction of fastest transcriptional change to run perpendicular to cortical sulci"? More pragmatically, this result from the geometry of transcriptional maps being influenced by sulcal geometry in their construction.

      Response 16

      Please see our response to Reviewer 1, Comment 1 above, where we introduce the new null models now analyzed to test for effects of mesh geometry on our findings. These models indicate that the topography of interpolated gene expression maps do not reflect influences of sulcal geometry on their construction.

      Comment 17

      p7: TD transitions are indicated to precede folding. This is based on a consideration of folding development based on the article by Chi et al 1977, which is quite an old reference. In that paper, the authors estimated the tempo of human folding development based on the inspection of photographs, which may not be sufficient for detecting the first changes in curvature leading to folds. The work of the Developing Human Connectome consortium may provide a more recent indication for timing. In their data, by PCW 21 there's already central sulcus, pre-central, post-central, intra-parietal, superior temporal, superior frontal which can be detected by computing the mean curvature of the pial surface (I can only provide a tweet for reference: https://twitter.com/R3RT0/status/1617119196617261056). Even by PCW 9-13 the callosal sulcus, sylvian fissure, parieto-occipital fissure, olfactory sulcus, cingulate sulcus and calcarine fissure have been reported to be present (Kostovic & Vasung 2009).

      Response 17

      Our field lacks the data necessary to provide a comprehensive empirical test for the temporal ordering of regional transcriptional profiles and emergence of folding. Our results show that transcriptional identities of V1 and TGd are - at least - present at the very earliest stages of sulcation in these regions. In response to the reviewers comment we have updated with a similar fetal mapping project which similarly shows evidence of the folds between weeks 17-21 and made the language around directionality more cautious.

      In Results: "The observed distribution of these angles across vertices was significantly skewed relative to a null based on random alignment between angles (pspin<0.01, Fig 2f, Methods) - indicating that there is indeed a tendency for cortical sulci and the direction of fastest transcriptional change to run perpendicular to each other (pspin<0.01, Fig 2f).

      As a preliminary probe for causality, we examined the developmental ordering of regional folding and regional transcriptional identity. Mapping the expression of high-ranking TD genes in fetal cortical laser dissection microarray data(Miller et al., 2014) from 21 PCW (Post Conception Weeks) (Methods) showed that the localized transcriptional identity of V1 and TGd regions in adulthood is apparent during the fetal periods when folding topology begins to emerge (Chi et al. 1977; Xu et al. 2022) (Fig " S2d).

      In Discussion: "By establishing that some of these cortical zones are evident at the time of cortical folding, we lend support to a “protomap”(Rakic 1988; O'Leary 1989; O'Leary et al. 2007; Rakic et al. 2009) like model where the placement of some cortical folds is set-up by rapid tangential changes in cyto-laminar composition of the developing cortex(Ronan et al., 2014; Toro and Burnod, 2005; Van Essen, 2020). The DEMs are derived from fully folded adult donors, and therefore some of the measured genetic-folding alignment might also be induced by mechanical distortion of the tissue during folding(Llinares-Benadero and Borrell 2019; Heuer and Toro 2019). However, no data currently exist to conclusively assess the directionality of this gene-folding relationship."

      Comment 18

      p7: In my supplemental figures (obtained from biorxiv, because I didn't find them among the files submitted to eLife) there's no S2j (only S2a-S2i).

      Response 18

      We apologize, this figure refers to S3k (formerly S3j), rather than S2j. We have updated the main text.

      Comment 19 p7: It is not clear from the methods (section 3b) how the adult and fetal brains were compared. Maybe using MSM (Robinson et al 2014)?

      Response 19

      We have now clarified this in Methods text as reproduced below.

      In Methods 3b: "We averaged scaled regional gene expression values between donors per gene, and filtered for genes in the fetal LDM dataset that were also represented in the adult DEM dataset - yielding a single final 20,476*235 gene-by-sample matrix of expression values for the human cortex at 21 PCW. Each TD peak region was then paired with the closest matching cortical label within the fetal regions. This matrix was then used to test if each TD expression signature discovered in the adult DEM dataset (Fig 2, Table 3) was already present in similar cortical regions at 21 PCW."

      Comment 20

      p7: WGCNA is used prominently, could you provide a brief introduction to its objectives? The gene coexpression networks are produced after adjusting the weight of the network edges to follow a scale-free topology, which is meant to reflect the nature of protein-protein interactions. Soft thresholding increases contrast, but doesn't this decrease a potential role of infinitesimal regulatory signals?

      Response 20

      We agree with the reviewer that the introduction to WGCNA needed additional details and have amended the Results (see below). One limitation of WGCNA-derived associations is that it will downweigh the role of smaller relationships including potentially important regulatory signals. WGCNA methods have been titrated to capture strong relationships. This is an inherent limitation of all co-expression driven methods which lead to an incomplete characterisation of the molecular biology. Nevertheless we feel these stronger relationships are still worth capturing and interrogating. We have updated the text to introduce WGCNA and acknowledge this potential weakness in the approach.

      In Results: "Briefly, WGCNA constructs a constructs a connectivity matrix by quantifying pairwise co-expression between genes, raising the correlations to a power (here 6) to emphasize strong correlations while penalizing weaker ones, and creating a Topological Overlap Matrix (TOM) to capture both pairwise similarities expression and connectivity. Modules of highly interconnected genes are identified through hierarchical clustering. The resultant WGCNA modules enable topographic and genetic integration because they each exist as both (i) a single expression map (eigenmap) for spatial comparison with neuroimaging data (Fig 3a,b, Methods) and, (ii) a unique gene set for enrichment analysis against marker genes systematically capturing multiple scales of cortical organization, namely: cortical layers, cell types, cell compartments, protein-protein interactions (PPI) and GO terms (Methods, Table S2 and S4)."

      Comment 21

      WGCNA modules look even more smooth than the gene expression maps. Are these maps comparable to low frequency eigenvectors? Autocorrelation in that case should be very strong?

      Response 21

      These modules are smooth as they are indeed eigenvectors which likely smooth out some of the more detailed but less common features seen in individual gene maps. These do exhibit high degrees of autocorrelation, nevertheless we are applying the spin test which is currently the appropriate null model for spatially autocorrelated cortical maps (Response 7).

      Comment 22

      If the WGCNA modules provide an orthogonal basis for surface data, is it completely unexpected that some of them will correlate with low-frequency patterns? What would happen if random low frequency patterns were generated? Would they also show correlations with some of the 16 WGCNA modules?

      Response 22

      We agree with the reviewer that if we used a generative model like BrainSMASH, we would likely see similar low frequency patterns. However, the inserted figure in Response 7 from Makello & Misic provide evidence that is not as conservative a null as the spin test when data exhibit high spatial autocorrelation. The spatial enrichment tests carried out on the WGCNA modules are all carried out using the spin test.

      Comment 23

      In part (a) I commented on the possibility that brain anatomy may introduce artifactual structure into the data that's being mapped. But what if the relationship between brain geometry and brain organisation were deeper than just the introduction of artefacts? The work of Lefebre et al (2014, https://doi.org/10.1109/ICPR.2014.107; 2018, https://doi.org/10.3389/fnins.2018.00354) shows that clustering based on the 3 lowest frequency eigenvectors of the Laplacian of a brain hemisphere mesh produce an almost perfect parcellation into lobes, with remarkable coincidences between parcel boundaries and primary folds and fissures. The work of Pang et al (https://doi.org/10.1101/2022.10.04.510897) suggests that the geometry of the brain plays a critical role in constraining its dynamics: they analyse >10k task-evoked brain maps and show that the eigenvectors of the brain laplacian parsimoniously explain the activity patterns. Could brain anatomy have a downward effect on brain organisation?

      Response 23

      The reviewer raises a fascinating extension of our work identifying spatial modes of gene expression. We agree that these are low frequency in nature, but would first like to note that the newly introduced null model indicates that the overlaps with salient neuroanatomical features are inherent in the expression data and not purely driven by anatomy in a methodological sense.

      Nevertheless we absolutely agree there is likely to be a complex multidirectional interplay between genetic expression patterns through development, developing morphology and the “final” adult topography of expression, neuroanatomical and functional patterns.

      We think that the current manuscript currently contains a lot of in depth analyses of these expression data, but agree that a more extensive modeling analysis of how expression might pattern or explain functional activation would be a fascinating follow on, especially in light of these studies from Pang and Lefebre. Nevertheless we think that this must be left for a future modeling paper integrating these modes of microscale, macroscale and functional anatomy.

      In Discussion: "Indeed, future work might find direct links between these module eigenvectors and similar low-frequency eigenvectors of cortical geometry have been used as basis functions to segment the cortex (Lefèvre et al. 2018) and explain complex functional activation patterns(Pang et al. 2023)."

      Comment 24

      On p11: ASD related to rare, deleterious mutations of strong effect is often associated with intellectual disability (where the social interaction component of ASD is more challenging to assess). Was there some indication of a relationship with that type of cognitive phenotype?

      Response 24

      Across the two ABIDE cohorts, the total number of those with ASD and IQ <70, which is the clinical threshold for intellectual disability was n=10, which unfortunately did not allow us to conduct a meaningful test of whether ID impacts the relationship between imaging changes in ASD and the expression maps of genes implicated in ASD by rare variants.

      Comment 25

      Could you clarify if the 6 donors were aligned using the folding-based method in freesurfer?

      Response 25

      The 6 donors were aligned using MSMsulc (Robinson et al., 2014), which is a folding based method from the HCP group. This is now clarified in the methods.

      In Methods 1: "Cortical surfaces were reconstructed for each AHBA donor MRI using FreeSurfer(Fischl, 2012), and coregistered between donors using surface matching of individuals’ folding morphology (MSMSulc) (Robinson et al., 2018)."

      Comment 26

      The authors make available a rich resource and a series of tools to facilitate their use. They have paid attention to encode their data in standard formats, and their code was made in Python using freely accessible packages instead of proprietary alternatives such as matlab. All this should greatly facilitate the adoption of the approach. I think it would be important to state more explicitly the conceptual assumptions that the methodology brings. In the same way that a GWAS approach relies on a Mendelian idea that individual alleles encode for phenotypes, what is the idea about the organisation of the brain implied by the orthogonal gene expression modules? Is it that phenotypes - micro and macro - are encoded by linear combinations of a reduced number of gene expression patterns? What would be the role of the environment? The role of non-genic regulatory regions? Some modalities of functional organisation do not seem to be encoded by the expression of any module. Is it just for lack of data or should this be seen as the sign for a different organisational principle? Likewise, what about the aspects of disorders that are not captured by expression modules? Would that hint, for example, to stronger environmental effects? What about linear combinations of modules? Nonlinear? Overall, the authors adopt implicitly, en passant, a gene-centric conceptual standpoint, which would benefit from being more clearly identified and articulated. There are citations to Rakic's protomap idea (I would also cite the original 1988 paper, and O'Leary's 1989 "protocortex" paper stressing the role of plasticity), which proposes that a basic version of brain cytoarchitecture is genetically determined and transposed from the proliferative ventricular zone regions to the cortical plate through radial migration. In p13 the authors indicate that their results support Rakic's protomap. Additionally, in p7 the authors suggest that their results support a causal arrow going from gene expression to sulcal anatomy. The reviews by O'leary et al (2007), Ronan & Fletcher (2014, already cited), Llinares-Benadero & Borrell (2019) could be considered, which also advocate for a similar perspective. For nuances on the idea that molecular signals provide positional information for brain development, the article by Sharpe (2019, DOI: 10.1242/dev.185967) is interesting. For nuances on the gene-centric approach of the paper the articles by Rockmann (2012, DOI: 10.1111/j.1558-5646.2011.01486.x) but also from the ENCODE consortium showing the importance of non-genic regions of the genome ("Perspectives on ENCODE" 2020 DOI: 10.1038/s41586-021-04213-8) could be considered. I wouldn't ask to cite ideas from the extended evolutionary synthesis about different inheritance systems (as reviewed by Jablonka & Lamb, DOI: 10.1017/9781108685412) or the idea of inherency (Newman 2017, DOI: 10.1007/978-3-319-33038-9_78-1), but the authors may find them interesting. Same goes for our own work on mechanical morphogenesis which expands on the idea of a downward causality (Heuer and Toro 2019, DOI: 10.1016/j.plrev.2019.01.012)

      Response 26

      We thank the reviewer for recommending these papers, which we enjoyed reading and have deepened our thinking on the topic. In addition to toning down some of the language with respect to causality that our data cannot directly address, we have included additional discussion and references as follows:

      In Discussion: "By establishing that some of these cortical zones are evident at the time of cortical folding, we lend support to a “protomap”(Rakic 1988; O'Leary 1989; O'Leary et al. 2007; Rakic et al. 2009) like model where the placement of some cortical folds is set-up by rapid tangential changes in cyto-laminar composition of the developing cortex(Ronan et al., 2014; Toro and Burnod, 2005; Van Essen, 2020). The DEMs are derived from fully folded adult donors, and therefore some of the measured genetic-folding alignment might also be induced by mechanical distortion of the tissue during folding(Llinares-Benadero and Borrell 2019; Heuer and Toro 2019). However, no data currently exist to conclusively assess the directionality of this gene-folding relationship.

      Overall, the manuscript is very interesting and a great contribution. The amount of work involved is impressive, and the presentation of the results very clear. My comments indicate some aspects that could be made more clear, for example, providing additional methodological information in the supplemental material. Also, making aware the readers and future users of MAGICC of the methodological and conceptual challenges that remain to be addressed in the future for this field of research.

      Reviewer #2 (Recommendations For The Authors):

      Comment 1

      The supplementary figures seem to be missing from the eLife submission (although I was able to find them on europepmc)

      Response 1

      We apologize that these were not included in the documents sent to reviewers. The up-to-date supplementary figures are included in this resubmission and again on biorxiv.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines genetically barcoded rabies viruses with spatial transcriptomics in vivo in the mouse brain to decode connectivity of neural circuits. The data generated by the combination of these approaches in this new way is mostly convincing as the authors provide validation and proof-of-concept that the approach can be successful. While this new combination of established techniques has promise for elucidating brain connectivity, there are still some nuances and caveats to the interpretations of the results that are lacking especially with regards to noting unexpected barcodes either due to unexpected/novel connections or unexpected rabies spread.

      In this revised manuscript, we added a new control experiment and additional analyses to address two main questions from the reviewers: (1) How the threshold of glycoprotein transcript counts used to identify source cells was determined, and (2) whether the limited long-range labeling was expected in the trans-synaptic experiment. The new experiments and analyses validated the distribution of source cells and presynaptic cells observed in the original barcoded transsynaptic tracing experiment and validated the choice of the threshold of glycoprotein transcripts. As the reviewers suggested, we also included additional discussion on how future experiments can improve upon this study, including strategies to improve source cell survival and minimizing viral infection caused by leaky expression of TVA. We also provided additional clarification on the analyses for both the retrograde labeling experiment and the trans-synaptic tracing experiment. We modified the Results and Discussion sections on the trans-synaptic tracing experiment to improve clarity to general readers. Detailed changes to address specific comments by reviewers are included below.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this preprint, Zhang et al. describe a new tool for mapping the connectivity of mouse neurons. Essentially, the tool leverages the known peculiar infection capabilities of Rabies virus: once injected into a specific site in the brain, this virus has the capability to "walk upstream" the neural circuits, both within cells and across cells: on one hand, the virus can enter from a nerve terminal and infect retrogradely the cell body of the same cell (retrograde transport). On the other hand, the virus can also spread to the presynaptic partners of the initial target cells, via retrograde viral transmission.

      Similarly to previously published approaches with other viruses, the authors engineer a complex library of viral variants, each carrying a unique sequence ('barcode'), so they can uniquely label and distinguish independent infection events and their specific presynaptic connections, and show that it is possible to read these barcodes in-situ, producing spatial connectivity maps. They also show that it is possible to read these barcodes together with endogenous mRNAs, and that this allows spatial mapping of cell types together with anatomical connectivity.

      The main novelty of this work lies in the combined use of rabies virus for retrograde labeling together with barcoding and in-situ readout. Previous studies had used rabies virus for retrograde labeling, albeit with low multiplexing capabilities, so only a handful of circuits could be traced at the same time. Other studies had instead used barcoded viral libraries for connectivity mapping, but mostly focused on the use of different viruses for labeling individual projections (anterograde tracing) and never used a retrograde-infective virus.

      The authors creatively merge these two bits of technology into a powerful genetic tool, and extensively and convincingly validate its performance against known anatomical knowledge. The authors also do a very good job at highlighting and discussing potential points of failure in the methods.

      We thank the reviewer for the enthusiastic comments.

      Unresolved questions, which more broadly affect also other viral-labeling methods, are for example how to deal with uneven tropism (ie. if the virus is unable or inefficient in infecting some specific parts of the brain), or how to prevent the cytotoxicity induced by the high levels of viral replication and expression, which will tend to produce "no source networks", neural circuits whose initial cell can't be identified because it's dead. This last point is particularly relevant for in-situ based approaches: while high expression levels are desirable for the particular barcode detection chemistry the authors chose to use (gap-filling), they are also potentially detrimental for cell survival, and risk producing extensive cell death (which indeed the authors single out as a detectable pitfall in their analysis). This is likely to be one of the major optimisation challenges for future implementations of these types of barcoding approaches.

      As the reviewer suggested, we included additional discussion about tropism and cytotoxicity in the revised Discussion. Our sensitivity for barcode detection is sufficient, since we estimated (based on manual proofreading) that most barcoded neurons had more than ten counts of a barcode in the trans-synaptic tracing experiment. The high sensitivity may potentially allow us to adapt next-generation rabies virus with low replication, such as the third generation ΔL rabies virus (Jin et al, 2022, biorxiv) in future optimizations.

      Overall the paper is well balanced, the data are well presented and the conclusions are strongly supported by the data. Impact-wise, the method is definitely going to be useful for the neurobiology research community.

      We thank the reviewer for her/his enthusiasm.

      Reviewer #2 (Public Review):

      Although the trans-synaptic tracing method mediated by the rabies virus (RV) has been widely utilized to infer input connectivity across the brain to a genetically defined population in mice, the analysis of labeled pre-synaptic neurons in terms of cell-type has been primarily reliant on classical low-throughput histochemical techniques. In this study, the authors made a significant advance toward high-throughput transcriptomic (TC) cell typing by both dissociated single-cell RNAseq and the spatial TC method known as BARseq to decode a vast array of molecularly labeled ("barcoded") RV vector library. First, they demonstrated that a barcoded-RV vector can be employed as a simple retrograde tracer akin to AAVretro. Second, they provided a theoretical classification of neural networks at the single-cell resolution that can be attained through barcoded-RV and concluded that the identification of the vast majority (ideally 100%) of starter cells (the origin of RV-based trans-synaptic tracing) is essential for the inference of single-cell resolution neural connectivity. Taking this into consideration, the authors opted for the BARseq-based spatial TC that could, in principle, capture all the starter cells. Finally, they demonstrated the proof-of-concept in the somatosensory cortex, including infrared connectivity from 381 putative pre-synaptic partners to 31 uniquely barcoded-starter cells, as well as many insightful estimations of input convergence at the cell-type resolution in vivo. While the manuscript encompasses significant technical and theoretical advances, it may be challenging for the general readers of eLife to comprehend. The following comments are offered to enhance the manuscript's clarity and readability.

      We modified the Results and Discussion sections on the trans-synaptic tracing experiment to improve clarity to general readers. We separated out the theoretical discussion about barcode sharing networks as a separate subsection, explicitly stated the rationale of how different barcode sharing networks are distinguished in the in situ trans-synaptic tracing experiment, and added additional discussion on future optimizations. Detailed descriptions are provided below.

      Major points:

      1. I find it difficult to comprehend the rationale behind labeling inhibitory neurons in the VISp through long-distance retrograde labeling from the VISal or Thalamus (Fig. 2F, I and Fig. S3) since long-distance projectors in the cortex are nearly 100% excitatory neurons. It is also unclear why such a large number of inhibitory neurons was labeled at a long distance through RV vector injections into the RSP/SC or VISal (Fig. 3K). Furthermore, a significant number of inhibitory starter cells in the somatosensory cortex was generated based on their projection to the striatum (Fig. 5H), which is unexpected given our current understanding of the cortico-striatum projections.

      The labeling of inhibitory neurons can be explained by several factors in the three different experiments.

      (1) In the scRNAseq-based retrograde labeling experiment (Fig. 2 and Fig. S3), the injection site VISal is adjacent to VISp. Because we dissected VISp for single-cell RNAseq, we may find labeled inhibitory neurons at the VISp border that extend short axons into VISal. We explained this in the revised Results.

      (2) In the in situ sequencing-based retrograde labeling experiment (Fig. 3,4), the proximity between the two injection sites VISal and RSP/SC, and the sequenced areas (which included not only VISp but also RSP) could also contribute to labeling through local axons of inhibitory neurons. Furthermore, because we also sequenced midbrain regions, inhibitory neurons in the superior colliculus could pick up the barcodes through local axons. We included an explanation of this in the revised Results.

      (3) In the trans-synaptic tracing experiment, we speculate that low level leaky expression from the TREtight promoter led to non-Cre-dependent expression in many neurons. To test this hypothesis, we first performed a control injection in which we saw that the fluorescent protein expression were indeed restricted to layer 5, as expected from corticostriatal labeling. Based on the labeling pattern, we estimated that about 12 copies of the glycoprotein transcript per cell would likely be needed to achieve fluorescent protein expression. Since many source cells in our experiment were below this threshold, these results support the hypothesis that the majority of source cells with low level expression of the glycoprotein were likely Cre-independent. Because these cells could still contribute to barcode sharing networks, we could not exclude them as in a conventional bulk trans-synaptic tracing experiment. In future experiments, we can potentially reduce this population by improving the helper AAV viruses used to express TVA and the glycoprotein. We included this explanation in Results and more detailed analysis in Supplementary Note 2, and discussed potential future optimizations in the Discussion. This new analysis in Supplementary Note 2 is also related to the Reviewer’s question regarding the threshold used for determining source cells (see below).

      1. It is unclear as to why the authors did not perform an analysis of the barcodes in Fig. 2. Given that the primary objective of this manuscript is to evaluate the effectiveness of multiplexing barcoded technology in RV vectors, I would strongly recommend that the authors provide a detailed description of the barcode data here, including any technical difficulties or limitations encountered, which will be of great value in the future design of RV-barcode technologies. In case the barcode data are not included in Fig. 2, I would suggest that the authors consider excluding Fig. 2 and Fig. S1-S3 in their entirety from the manuscript to enhance its readability for general readers.

      In the single-cell RNAseq-based retrograde tracing, all barcodes recovered matched to known barcodes in the corresponding library. We included a short description of these results in the revised manuscript.

      1. Regarding the trans-synaptic tracing utilizing a barcoded RV vector in conjunction with BARseq decoding (Fig. 5), which is the core of this manuscript, I have a few specific questions/comments. First, the rationale behind defining cells with only two rolonies counts of rabies glycoprotein (RG) as starter cells is unclear. Why did the authors not analyze the sample based on the colocalization of GFP (from the AAV) and mCherry (from the RV) proteins, which is a conventional method to define starter cells? If this approach is technically difficult, the authors could provide an independent histochemical assessment of the detection stringency of GFP positive cells based on two or more colonies of RG.

      In situ sequencing does not preserve fluorescent protein signals, so we used transcript counts to determine which cells expressed the glycoprotein. We have added new analyses in the Results and in Supplementary Note 2 to determine the transcript counts that were equivalent to cells that had detectable BFP expression. We found that BFP expression is equivalent to ~12 counts of the glycoprotein transcript per cell, which is much higher than the threshold we used. However, we could not solely rely on this estimate to define the source cells, because cells that had lower expression of the glycoprotein (possibly from leaky Cre-independent expression) may still pass the barcodes to presynaptic cells. This can lead to an underestimation of double-labeled and connected-source networks and an overestimation of single-source networks and can obscure synaptic connectivity at the cellular resolution. We thus used a very conservative threshold of two transcripts in the analysis. This conservative threshold will likely overestimate the number of source cells that shared barcodes and underestimate the number of single-source networks. Since this is a first study of barcoded transsynaptic tracing in vivo, we chose to err on the conservative side to make sure that the subsequent analysis has single-cell resolution. Future characterization and optimization may lead to a better threshold to fully utilize data.

      Second, it is difficult to interpret the proportion of the 2,914 barcoded cells that were linked to barcoded starter cells (single-source, double-labeled, or connected-source) and those that remained orphan (no-source or lost-source). A simple table or bar graph representation would be helpful. The abundance of the no-source network (resulting from Cre-independent initial infection of the RV vector) can be estimated in independent negative control experiments that omit either Cre injection or AAV-RG injection. The latter, if combined with BARseq decoding, can provide an experimental prediction of the frequency of double-labeled events since connected-source networks are not labeled in the absence of RG.

      We have added Table 2, which breaks down the 2,914 barcoded cells based on whether they are presynaptic or source cells, and which type of network they belong to. We agree with the reviewer that the additional Cre- or RG- control experiments in parallel would allow an independent estimate of the double labeled networks and the no-source networks. We have included added a discussion of possible controls to further optimize the trans-synaptic tracing approach in future studies in the Discussion.

      Third, I would appreciate more quantitative data on the putative single-source network (Fig. 5I and S6) in terms of the distribution of pre- and post-synaptic TC cell types. The majority of labeling appeared to occur locally, with only two thalamic neurons observed in sample 25311842 (Fig. S6). How many instances of long-distance labeling (for example, > 500 microns away from the injection site) were observed in total? Is this low efficiency of long-distance labeling expected based on the utilized combinations of AAVs and RV vectors? A simple independent RV tracing solely detecting mCherry would be useful for evaluating the labeling efficiency of the method. I have experienced similar "less jump" RV tracing when RV particles were prepared in a single step, as this study did, rather than multiple rounds of amplification in traditional protocols, such as Osakada F et al Nat Protocol 2013.

      We imaged an animal that was injected in parallel to assess labeling (now included in Supplementary Note 2 and Supp. Fig. S5). The labeling pattern in the newly imaged animal was largely consistent with the results from the barcoded experiment: most labeled neurons were seen in the vicinity of the injection site, and sparser labeling was seen in other cortical areas and the thalamus. We further found that most neurons that were labeled in the thalamus were about 1 mm posterior to the center of the injection site, and thus would not have been sequenced in the in situ sequencing experiment (in which we sequenced about 640 µm of tissue spanning the injection site).

      In addition, we found that the bulk of the cells that expressed mCherry from the rabies virus only partially overlapped with the area that contained cells co-expressing BFP with the rabies glycoprotein. Moreover, very few cells co-expressed mCherry and BFP, which would be considered source cells in a conventional mono-synaptic tracing experiment. The small numbers of source cells likely also contributed to the sparseness of long-range labeling in the barcoded experiment.

      These interpretations and comparisons to the barcoded experiment are now included in Supplementary Note 2.

      Reviewer #3 (Public Review):

      The manuscript by Zhang and colleagues attempts to combine genetically barcoded rabies viruses with spatial transcriptomics in order to genetically identify connected pairs. The major shortcoming with the application of a barcoded rabies virus, as reported by 2 groups prior, is that with the high dropout rate inherent in single cell procedures, it is difficult to definitively identify connected pairs. By combining the two methods, they are able to establish a platform for doing that, and provide insight into connectivity, as well as pros and cons of their method, which is well thought out and balanced.

      Overall the manuscript is well-done, but I have a few minor considerations about tone and accuracy of statements, as well as some limitations in how experiments were done. First, the idea of using rabies to obtain broader tropism than AAVs isn't really accurate - each virus has its own set of tropisms, and it isn't clear that rabies is broader (or can be made to be broader).

      As the reviewer suggested, we toned down this claim and stated that rabies virus has different tropism to complement AAV.

      Second, rabies does not label all neurons that project to a target site - it labels some fraction of them.

      We meant to say that retrograde labeling is not restricted to labeling neurons from a certain brain region. We have clarified in the text.

      Third, the high rate of rabies virus mutation should be considered - if it is, or is not a problem in detecting barcodes with high fidelity, this should be noted.

      Our analysis showed that sequencing 15 bases was sufficient to tolerate a small number of mismatches in the barcode sequences and could distinguish real barcodes from random sequences (Fig. 4A). Thus, we can tolerate mutations in the barcode sequence. We have clarified this in the text.

      Fourth, there are a number of implicit assumptions in this manuscript, not all of which are equally backed up by data. For example, it is not clear that all rabies virus transmission is synaptic specific; in fact, quite a few studies argue that it is not (e.g., detection of rabies transcripts in glial cells). Thus, arguments about lost-source networks and the idea that if a cell is lost from the network, that will stop synaptic transmission, is not clear. There is also the very real propensity that, the sicker a starter cell gets, the more non-specific spread of virus (e.g., via necrosis) occurs.

      We agree with the reviewer that how strictly virus transmission is restricted to synapses remains a hotly debated question in the field, and this question is relevant not only to techniques based on barcoded rabies tracing, but to all trans-synaptic tracing experiments. A barcoding-based approach can generate single-cell data that enable direct comparison to other data modalities that measure synaptic connectivity, such as multi-patch and EM. These future experiments may provide additional insights into the questions that the reviewer raised. We have included additional discussion about how non-synaptic transmission of barcodes because of the necrosis of source cells may affect the analysis in the Discussion.

      Regarding the scenario in which the source cell dies, we agree with the reviewer and have clarified in the revised manuscript.

      Fifth, in the experiments performed in Figure 5, the authors used a FLEx-TVA expressed via a retrograde Cre, and followed this by injection of their rabies virus library. The issue here is that there will be many (potentially thousands) of local infection events near the injection site that TVA-mediated but are Cre-dependent (=off-target expression of TVA in the absence of Cre). This is a major confound in interpreting the labeling of these cells. They may express very low levels of TVA, but still have infection be mediated by TVA. The authors did not clearly explore how expression of TVA related to rabies virus infection of cells near the rabies injection site. A modified version of TVA, such as 66T, should have been used to mitigate this issue. Otherwise, it is impossible to determine connectivity locally. The authors do not go to great lengths to interpret the findings of these observations, so I am not sure this is a critical issue, but it should be pointed out by the authors as a caveat to their dataset.

      We agree with the reviewer that this type of infection could potentially be a major contributor to no-source networks, which were abundant in our experiment. Because small no-source networks were excluded from our analyses, and large no-source networks were only included for barcodes with low frequency (i.e., it would be nearly impossible statistically to generate such large no-source networks from independent infections), we believe that the effect of independent infections on our analyses were minimized. We have added a control experiment in Fig S5 and Supplementary Note 2, which further supported the hypothesis that there were many independent infections. We also included additional discussion about how this can be assessed and optimized in future studies in the Discussion.

      Sixth, the authors are making estimates of rabies spread by comparison to a set of experiments that was performed quite differently. In the two studies cited (Liu et al., done the standard way, and Wertz et al., tracing from a single cell), the authors were likely infecting with a rabies virus using a high multiplicity of infection, which likely yields higher rates of viral expression in these starter cells and higher levels of input labeling. However, in these experiments, the authors need to infect with a low MOI, and explicitly exclude cells with >1 barcode. Having only a single virion trigger infection of starter cells will likely reduce the #s of inputs relative to starter neurons. Thus, the stringent criteria for excluding small networks may not be entirely warranted. If the authors wish to only explore larger networks, this caveat should be explicitly noted.

      In the trans-synaptic labeling experiment, we actually used high rabies titer (200 nL, 7.6e10 iu/mL) that was comparable to conventional rabies tracing experiments. We did not exclude cells with multiple barcodes (as opposed to barcodes in multiple source cells), because we could resolve multiple barcodes in the same cell and indeed found many cells with multiple barcodes. We have clarified this in the text.

      Overall, if the caveats above are noted and more nuance is added to some of the interpretation and discussion of results, this would greatly help the manuscript, as readers will be looking to the authors as the authority on how to use this technology.

      In addition to addressing the specific concerns of the reviewer as described above, we modified the Results and Discussion sections on the trans-synaptic tracing experiment to improve clarity to general readers and expanded the discussion on future optimizations.

      Reviewer #1 (Recommendations For The Authors):

      The scientific problem is clearly stated and well laid out, the data are clearly presented, and the experiments well justified and nicely discussed. It was overall a very enjoyable read. The figures are generally nice and clear, however, I find the legends excessively concise. A bit too often, they just sort of introduce the title of the panel rather than a proper explanation of what it is depicted. A clear case is for example visible in Fig 2, where the description of the panels is minimal, but this is a general trend of the manuscript. This makes the figures a bit hard to follow as self-contained entities, without having to continuously go back to the main text. I think this could be improved with longer and more helpful descriptions.

      We have revised all figure legends to make them more descriptive.

      Other minor things:

      In the cDNA synthesis step for in-situ sequencing, I believe the authors might have forgotten one detail: the addition of aminoallyl dUTP to the RT reaction. If I recall correctly this is done in BARseq. The fact that the authors crosslink with BS-PEG on day 2, makes me suspect they spike in these nucleotides during the RT but this is not specified in the relevant step. Perhaps this is a mistake that needs correction.

      The RT primers we used have an amine group at 5’, which directly allows crosslinking. Thus, we did not need to spike in aminoallyl dUTP in the RT reaction. We have clarified this in the Methods.

      Reviewer #2 (Recommendations For The Authors):

      Throughout the manuscript, there are frequent references to the "Methods" section for important details. However, it can be challenging to determine which specific section of the Methods the authors are referring to, and in some cases, a thorough examination of the entire Methods section fails to locate the exact information needed to support the authors' claims. Below are a few specific examples of this issue. The authors are encouraged to be more precise in their references to the Methods section.

      In the revised manuscript, we numbered each subsection of Methods and updated pointers and associated hyperlinks in the main text to the subsection numbers.

      • On page 7, line 14, it is unclear how the authors compared the cell marker gene expression with the marker gene expression in the reference cell type.

      We have clarified in the revised manuscript.

      • On page 7, line 33, the authors note that some barcodes may have been missed during the sequencing of the rabies virus libraries, but the Methods section lacked a convincing explanation on this issue (see my point 2 above).

      We included a separate subsection on the sequencing of rabies libraries and the analysis of the sequencing depth in the Methods. In this new subsection, we further clarified our reasoning for identifying the lack of sequencing depth as a reason for missing barcodes, especially in comparison to sequencing depth required for establishing exact molecule counts used in established MAPseq and BARseq techniques with Sindbis libraries.

      • On page 9, line 44, the authors state that they considered a barcode to be associated with a cell if they found at least six molecules of that barcode in a cell, as detailed in the Methods section. However, the rationale behind this level of stringency is not provided in the Methods.

      We initially chose this threshold based on visual inspection of the sequencing images of the barcoded cells. Because the labeled cell types were consistent with our expectations (Fig. 4E-G), we did not further optimize the threshold for detecting retrogradely labeled barcoded cells.

      • I have noticed that some important explanations of figure panels are missing in the legends, making it challenging to understand the figures. Below are typical examples of this issue.

      In addition to the examples that the reviewer mentioned below, we also revised many other figure panels to make them clear to the readers.

      • In Fig. 2, "RV into SC" in panel C does not make sense, as RV was injected into the thalamus. There is no explanation of the images in this panel C.

      We have corrected the typo in the revision.

      • In Fig. 3, information on the endogenous gene panel for cell type classification (Table S3) could be mentioned in the legend or corresponding text.

      We now cite Table S3 both in Fig 3 legend and in the main text. We also included a list of the 104 cell type marker genes we used in Table S3.

      • In panel J, it is unclear why the total number of BC cells is 2,752, and not 4,130 as mentioned in the text.

      This is a typo. We have corrected this in the revision. The correct number (3,746) refers to the number of cells that did not belong to either of the two categories at the bottom of the panel, and not the total number of neurons. To make this clear, we now also include the total number of barcoded cells at the top of the panel.

      • In Fig. 4, the definitions of "+" and "−" symbols in panels K and L are unclear. Also, it seems that the second left column of panel K should read "T −."

      We corrected the typo in K, further clarified the “Area” labels, and changed the “S” label in 4K to “−”. This change does not change the original meaning of the figure: when considering the variance explained in L4/5 IT neurons, considering the subclass compositional profile is equivalent to not using the compositional profiles of cell types, because L4/5 IT neurons all belong to the same subclass (L4/5 IT subclass). Although operationally we simply considered subclass-level compositional profiles when calculating the variance explained, we think that changing this to “−” is clearer for the readers.

      • In Fig. 5, panel E is uninterpretable.

      We revised the main text and the figure to clarify how we manually proofread cells to determine the QC thresholds for barcoded cells. These plots showed a summary of the proofreading. We also revised the figures to indicate that they showed the fraction of barcoded cells that were considered real after proofreading. In the revised version, we moved these plots to Fig. S5.

      • In Fig. S1, I do not understand the identity of the six samples on the X-axis of panel A (given that only two animals were described in the main text) and what panel B shows, including the definition of map_cluster_conf and map_cluster_corr.

      In the revised Fig. S1, we made it more explicit that the six animals include both animals used for retrograde tracing (2 animals) and those used for trans-synaptic tracing (4 animals). We updated the y axis labels to be more readable and cited the relevant Methods section for definitions.

      • In Fig. S2, please provide the definitions of blue and red dots and values in panel A, as well as the color codes and size of the circles in panel B. My overall impression from panel B is that there is no significant difference between RV-infected and non-infected cells. The authors should provide more quantitative and statistical support for the claim that "RV-infected cells had higher expression of immune response-related genes."

      We toned down the statement to “Consistent with previous studies […], some immune response related genes were up-regulated in virus-infected cells compared to non-infected cells.” Because the main point of the single-cell RNAseq analysis was that rabies did not affect the ability to distinguish transcriptomic types, the change in immune response-related genes was not essential to the main conclusions. We clarified the red and blue dots in panel A and changed panel B to show the top up-regulated immune response-related genes in the revised manuscript.

      • In Fig. S3, the definitions of the color code and circle size are missing.

      We have added the legends in Fig. S3.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We appreciate the reviewers for their insightful feedback, which has substantially improved our manuscript. Following the suggestions of the reviewers, we have undertaken the following major revisions:

      a. Concerning data transformation, we have adjusted the methodology in Figures 2 and 3. Instead of normalizing c-Fos density to the whole brain c-Fos density as initially described, we now normalize to the c-Fos density of the corresponding brain region in the control group. b. We have substituted the PCA approach with hierarchical clustering in Figures 2 and 3.

      c. In the discussion section, we added a subsection on study limitations, focusing on the variations in drug administration routes and anesthesia depth.

      Enclosed are our detailed responses to each of the reviewer's comments.

      Reviewer #1:

      1a. The addition of the EEG/EMG is useful, however, this information is not discussed. For instance, there are differences in EEG/EMG between the two groups (only Ket significantly increased delta/theta power, and only ISO decreased EMG power). These results should be discussed as well as the limitation of not having physiological measures of anesthesia to control for the anesthesia depth.

      1b. The possibility that the differences in fos observed may be due to the doses used should be discussed.

      1c. The possibility that the differences in fos observed may be due kinetic of anesthetic used should be discussed.

      Thank you for your suggestions. We have now discussed EEG/EMG result, limitation of not having physiological measures of anesthesia to control for the anesthesia depth, The possibility that the differences in fos observed may be due to the doses, The possibility that the differences in Fos observed may be due kinetic of anesthetic in the revised manuscript (Lines 308-331, also shown below).

      Lines 308-331: "...Our findings indicate that c-Fos expression in the KET group is significantly elevated compared to the ISO group, and the saline group exhibits notably higher c-Fos expression than the home cage group, as seen in Supplementary Figures 2 and 3. Intraperitoneal saline injections in the saline group, despite pre-experiment acclimation with handling and injections for four days, may still evoke pain and stress responses in mice. Subtle yet measurable variations in brain states between the home cage and saline groups were observed, characterized by changes in normalized EEG delta/theta power (home cage: 0.05±0.09; saline: -0.03±0.11) and EMG power (home cage: -0.37±0.34; saline: 0.04±0.13), as shown in Supplementary Figure 1. These changes suggest a relative increase in overall brain activity in the saline group compared to the home cage group, potentially contributing to the higher c-Fos expression. Although the difference in EEG power between the ISO group and the home cage control was not significant, the increase in EEG power observed in the ISO group was similar to that of KET (0.47 ± 0.07 vs 0.59 ± 0.10), suggesting that both agents may induce loss of consciousness in mice. Regarding EMG power, ISO showed a significant decrease in EMG power compared to its control group. In contrast, the KET group showed a lesser reduction in EMG power (ISO: -1.815± 0.10; KET: -0.96 ± 0.21), which may partly explain the higher overall c-Fos expression levels in the KET group. This is consistent with previous studies where ketamine doses up to 150 mg/kg increase delta power while eliciting a wakefulness-like pattern of c-Fos expression across the brain [1]. Furthermore, the observed differences in c-Fos expression may arise in part from the dosages, routes of administration, and their distinct pharmacokinetic profiles. This variation is compounded by the lack of detailed physiological monitoring, such as blood pressure, heart rate, and respiration, affecting our ability to precisely assess anesthesia depth. Future studies incorporating comprehensive physiological monitoring and controlled dosing regimens are essential to further elucidate these relationships and refine our understanding of the effects of anesthetics on brain activity"

      1. Lu J, Nelson LE, Franks N, Maze M, Chamberlin NL, Saper CB: Role of endogenous sleep-wake and analgesic systems in anesthesia. J Comp Neurol 2008, 508(4):648-662.

      2b. I am confused because Fig 2C seems to show significant decrease in %fos in the hypothalamus, midbrain and cerebellum after KET, while the author responded that " in our analysis, we did not detect regions with significant downregulation when comparing anesthetized mice with controls." Moreover the new figure in the rebuttal in response to reviewer 2 suggests that Ket increases Fos in almost every single region (green vs blue) which is not the conclusion of the paper.

      Your concern regarding the apparent discrepancy is well-founded. The inconsistency arose due to an inappropriate data transformation, which affected the interpretation. We have now rectified this by adjusting the data transformation in Figures 2 and 3. Specifically, we have recalculated the log relative c-Fos density values relative to the control group for each brain region. This revision has resolved the issue, confirming that our analysis did not detect any regions with significant downregulation in the anesthetized mice compared to controls. We have also updated the results, discussion, and methods sections of Figures 2 and 3 to accurately reflect these changes and ensure consistency with our findings.

      Author response image 1.

      Figure 2. Whole-brain distributions of c-Fos+ cells induced by ISO and KET. (A) Hierarchical clustering was performed on the log relative c-Fos density data for ISO and KET using the complete linkage method based on the Euclidean distance matrix, with clusters identified by a dendrogram cut-off ratio of 0.5. Numerical labels correspond to distinct clusters within the dendrogram. (B) Silhouette values plotted against the ratio of tree height for ISO and KET, indicating relatively higher Silhouette values at 0.5 (dashed line), which is associated with optimal clustering. (C) The number of clusters identified in each treatment condition at different ratios of the dendrogram tree height, with a cut-off level of 0.5 corresponding to 4 clusters for both ISO and KET (indicated by the dashed line). (D) The bar graph depicts Z scores for clusters in ISO and KET conditions, represented with mean values and standard errors. One-way ANOVA with Tukey's post hoc multiple comparisons. ns: no significance; ***P < 0.001. (E) Z-scored log relative density of c-Fos expression in the clustered brain regions. The order and abbreviations of the brain regions and the numerical labels correspond to those in Figure 2A. The red box denotes the cluster with the highest mean Z score in comparison to other clusters. CTX: cortex; TH: thalamus; HY: hypothalamus; MB: midbrain; HB: hindbrain.

      Author response image 2.

      Figure 3. Similarities and differences in ISO and KET activated c-Fos brain areas. (A) Hierarchical clustering was performed on the log-transformed relative c-Fos density data for ISO and KET using the complete linkage method based on the Euclidean distance matrix, with clusters identified by a dendrogram cut-off ratio of 0.5. (B) Silhouette values are plotted against the ratio of tree height from the hierarchical clustered dendrogram in Figure 3A. (C) The relationship between the number of clusters and the tree height ratio of the dendrogram for ISO and KET, with a cut-off ratio of 0.5 resulting in 3 clusters for ISO and 5 for KET (indicated by the dashed line). (D) The bar graph depicts Z scores for clusters in ISO and KET conditions, represented with mean values and standard errors. One-way ANOVA with Tukey's post hoc multiple comparisons. ns: no significance; ***P < 0.001. (E) Z-scored log relative density of c-Fos expression within the identified brain region clusters. The arrangement, abbreviations of the brain regions, and the numerical labels are in accordance with Figure 3A. The red boxes highlight brain regions that rank within the top 10 percent of Z score values. The white boxes denote brain regions with an Z score less than -2.

      1. There are still critical misinterpretations of the PCA analysis. For instance, it is mentioned that " KET is associated with the activation of cortical regions (as evidenced by positive PC1 coefficients in MOB, AON, MO, ACA, and ORB) and the inhibition of subcortical areas (indicated by negative coefficients) " as well as " KET displays cortical activation and subcortical inhibition, whereas ISO shows a contrasting preference, activating the cerebral nucleus (CNU) and the hypothalamus while inhibiting cortical areas. To reduce inter-individual variability." These interpretations are in complete contradiction with the answer 2b above that there was no region that had decreased Fos by either anesthetic.

      Thank you for bringing this to our attention. In response to your concerns, we have made significant revisions to our data analysis. We have updated our input data to incorporate log-transformed relative c-Fos density values, normalized against the control group for each brain region, as illustrated in Figures 2 and 3. Instead of PCA, we have applied this updated data to hierarchical clustering analysis. The results of these analyses are consistent with our original observation that neither anesthetic led to a decrease in Fos expression in any region.

      1. I still do not understand the rationale for the use of that metric. The use of a % of total Fos makes the data for each region dependent on the data of the other regions which wrongly leads to the conclusion that some regions are inhibited while they are not when looking at the raw data. Moreover, the interdependence of the variable (relative density) may affect the covariance structure which the PCA relies upon. Why not using the PCA on the logarithm of the raw data or on a relative density compared to the control group on a region-per-region basis instead of the whole brain?

      Thank you for your insightful suggestion. Following your advice, we have revised our approach and now utilize the logarithm of the relative density compared to the control group on a region-by-region basis. We attempted PCA analyses using the logarithm of the raw data, the logarithm of the Z-score, and the logarithm of the relative density compared to control, but none yielded distinct clusters.

      Author response image 3.

      As a result, we employed hierarchical cluster analysis. We then examined the Z-scores of the log-transformed relative c-Fos densities (Figures 2E and 3E) to assess expression levels across clusters. Our analysis revealed that neither ISO nor KET treatments led to a significant suppression of c-Fos expression in the 53 brain regions examined. In the ISO group alone, there were 10 regions that demonstrated relative suppression (Z-score < -2, indicated by white boxes) as shown in Figure 3.

      Fig. 2B: it's unclear to me why the regions are connected by a line. Such representation is normally used for time series/within-subject series. What is the rationale for the order of the regions and the use of the line? The line connecting randomly organized regions is meaningless and confusing.

      Thank you for your suggestion. We have discontinued the use of PCA calculations and have removed this figure.

      Fig 6A. The correlation matrices are difficult to interpret because of the low resolution and arbitrary order of brain regions. I recommend using hierarchical clustering and/or a combination of hierarchical clustering and anatomical organization (e.g. PMID: 31937658). While it is difficult to add the name of the regions on the graph I recommend providing supplementary figures with large high-resolution figures with the name of each brain region so the reader can actually identify the correlation between specific brain regions and the whole brain, Rationale for Metric Choice: Note that I do not dispute the choice of the log which is appropriate, it is the choice of using the relative density that I am questioning.

      Thank you for your constructive feedback. In line with your suggestion, we have implemented hierarchical clustering combined with anatomical organization as per the referenced literature. Additionally, we have updated the vector diagrams in Figure 6A to present them with greater clarity.

      Furthermore, we have revised our network modular division method based on cited literature recommendations. We used hierarchical clustering with correlation coefficients to segment the network into modules, illustrated in Figure 6—figure supplement 1. Due to the singular module structure of the KET network and the sparsity of intermodular connections in the home cage and saline networks, the assessment of network hub nodes did not employ within-module degree Z-score and participation coefficients, as these measures predominantly underscore the importance of connections within and between modules. Instead, we used degree, betweenness centrality, and eigenvector centrality to detect the hub nodes, as detailed in Figure 6—figure supplement 2. With this new approach, the hub node for the KET condition changed from SS to TeA. Corresponding updates have been made to the results section for Figure 6, as well as to the related discussions and the abstract of our paper.

      Author response image 4.

      Figure 6. Generation of anesthetics-induced networks and identification of hub regions. (A) Heatmaps display the correlations of log c-Fos densities within brain regions (CTX, CNU, TH, HY, MB, and HB) for various states (home cage, ISO, saline, KET). Correlations are color-coded according to Pearson's coefficients. The brain regions within each anatomical category are organized by hierarchical clustering of their correlation coefficients. (B) Network diagrams illustrate significant positive correlations (P < 0.05) between regions, with Pearson’s r exceeding 0.82. Edge thickness indicates correlation magnitude, and node size reflects the number of connections (degree). Node color denotes betweenness centrality, with a spectrum ranging from dark blue (lowest) to dark red (highest). The networks are organized into modules consistent with the clustering depicted in Supplementary Figure 8. Figure 6—figure supplement 1

      Author response image 5.

      Figure 6—figure supplement 1. Hierarchical clustering of brain regions under various conditions: home cage, ISO, saline, and KET. (A) Heatmaps show the relative distances among brain regions assessed in naive mice. Modules were identified by sectioning each dendrogram at a 0.7 threshold. (B) Silhouette scores plotted against the dendrogram tree height ratio for each condition, with optimal cluster definition indicated by a dashed line at a 0.7 ratio. (C) The number of clusters formed at different cutoff levels. At a ratio of 0.7, ISO and saline treatments result in three clusters, whereas home cage and KET conditions yield two clusters. (D) The mean Pearson's correlation coefficient (r) was computed from interregional correlations displayed in Figure 6A. Data were analyzed using one-way ANOVA with Tukey’s post hoc test, ***P < 0.001.

      Author response image 6.

      Figure 6—figure supplement 2. Hub region characterization across different conditions: home cage (A), ISO (B), saline (C), and KET (D) treatments. Brain regions are sorted by degree, betweenness centrality, and eigenvector centrality, with each metric presented in separate bar graphs. Bars to the left of the dashed line indicate the top 20% of regions by rank, highlighting the most central nodes within the network. Red bars signify regions that consistently appear within the top rankings for both degree and betweenness centrality across the metrics.

      1. I am still having difficulties understanding Fig. 3.

      Panel A: The lack of identification for the dots in panel A makes it impossible to understand which regions are relevant.

      Panel B: what is the metric that the up/down arrow summarizes? Fos density? Relative density? PC1/2?

      Panel C: it's unclear to me why the regions are connected by a line. Such representation is normally used for time series/within-subject series. What is the rationale for the order of the regions?

      Thank you for your patience and for reiterating your concerns regarding Figure 3.

      a. In Panel A, we have substituted the original content with a display of hierarchical clustering results, which now clearly marks each brain region. This change aids readers in identifying regions with similar expression patterns and facilitates a more intuitive understanding of the data.

      a. Acknowledging that our analysis did not reveal any significantly inhibited brain regions, we have decided to remove the previous version of Panel B from the figure.

      b. We have discontinued the use of PCA calculations and have removed this figure to avoid any confusion it may have caused. Our revised analysis focuses on hierarchical clustering, which are presented in the updated figures.

      Reviewer #2:

      1. Aside from issues with their data transformation (see below), (a) I think they have some interesting Fos counts data in Figures 4B and 5B that indicate shared and distinct activation patterns after KET vs. ISO based anesthesia. These data are far closer to the raw data than PC analyses and need to be described and analyzed in the first figures long before figures with the more abstracted PC analyses. In other words, you need to show the concrete raw data before describing the highly transformed and abstracted PC analyses. (b) This gets to the main point that when selecting brain areas for follow up analyses, these should be chosen based on the concrete Fos counts data, not the highly transformed and abstracted PC analyses.

      Thank you for your suggestions.

      a. We have added the original c-Fos cell density distribution maps for Figures 2, 3, 4, and 5 in Supplementary Figures 2 and 3 (also shown below). To maintain consistency across the document, we have updated both the y-axis label and the corresponding data in Figures 4B and 5B from 'c-Fos cell count' to 'c-Fos density'.

      b. The analyses in Figures 2 and 3 include all brain regions. Figures 4 and 5 present the brain regions with significant differences as shown in Figure 3—figure supplement 1.

      Author response image 7.

      Figure 2—figure supplement 1. The c-Fos density in 53 brain areas for different conditions. (home cage, n = 6; ISO, n = 6 mice; saline, n = 8; KET, n = 6). Each point represents the c-Fos density in a specific brain region, denoted on the y-axis with both abbreviations and full names. Data are shown as mean ± SEM. Brain regions are categorized into 12 brain structures, as indicated on the right side of the graph.

      Author response image 8.

      Figure 3—figure supplement 1. c-Fos density visualization across 201 distinct brain regions under various conditions. The graph depicts the c-Fos density levels for each condition, with data presented as mean and standard error. Brain regions with statistically significant differences are featured in Figures 4 and 5. Brain regions are organized into major anatomical subdivisions, as indicated on the left side of the graph.

      1. Now, the choice of data transformation for Fos counts is the most significant problem. First, the authors show in the response letter that not using this transformation (region density/brain density) leads to no clustering. However, they also showed the region-densities without transformation (which we appreciate) and it looks like overall Fos levels in the control group Home (ISO) are a magnitude (~10-fold) higher than those in the control group Saline (KET) across all regions shown. This large difference seems unlikely to be due to a biologically driven effect and seems more likely to be due to a technical issue, such as differences in staining or imaging between experiments. Was the Homecage-ISO experiment or at least the Fos labeling and imaging performed at the same time as for the Saline-Ketamine experiment? Please state the answer to this question in the Results section one way or the other.

      a. “Home (ISO) are a magnitude (~10-fold) higher than those in the control group saline (KET) across all regions shown.” We believe you might be indicating that compared to the home cage group (gray), the saline group (blue) shows a 10-fold higher expression (Supplementary Figure 2/3). Indeed, we observed that the total number of c-Fos cells in the home cage group is significantly lower than in the saline group. This difference may be due to reduced sleep during the light-on period (ZT 6- ZT 7.5) in the saline mice or the pain and stress response caused by intraperitoneal injection of saline. We have explained this discrepancy in the discussion section.Line 308-317(also see below)

      “…Our findings indicate that c-Fos expression in the KET group is significantly elevated compared to the ISO group, and the saline group exhibits notably higher c-Fos expression than the home cage group, as seen in Supplementary Figures 2 and 3. Intraperitoneal saline injections in the saline group, despite pre-experiment acclimation with handling and injections for four days, may still evoke pain and stress responses in mice. Subtle yet measurable variations in brain states between the home cage and saline groups were observed, characterized by changes in normalized EEG delta/theta power (home cage: 0.05±0.09; saline: -0.03±0.11) and EMG power (home cage: -0.37±0.34; saline: 0.04±0.13), as shown in Figure 1—figure supplement 1. These changes suggest a relative increase in overall brain activity in the saline group compared to the home cage group, potentially contributing to the higher c-Fos expression…”

      b. Drug administration and tissue collection for both Homecage-ISO and Saline-Ketamine groups were consistently scheduled at 13:00 and 14:30, respectively. Four mice were administered drugs and had tissues collected each day, with two from the experimental group and two from the control group, to ensure consistent sampling. The 4% PFA fixation time, sucrose dehydration time, primary and secondary antibody concentrations and incubation times, staining, and imaging parameters and equipment (exposure time for VS120 imaging was fixed at 100ms) were all conducted according to a unified protocol.

      We have included the following statement in the results section: Line 81-83, “Sample collection for all mice was uniformly conducted at 14:30 (ZT7.5), and the c-Fos labeling and imaging were performed using consistent parameters throughout all experiments. ”

      1. Second, they need to deal with this large difference in overall staining or imaging for these two (Home/ISO and Saline/KET) experiments more directly; their current normalization choice does not really account for the large overall differences in mean values and variability in Fos counts (e.g. due to labeling and imaging differences).

      3a. I think one option (not perfect but I think better than the current normalization choice) could be z-scoring each treatment to its respective control. They can analyze these z-scored data first, and then in later figures show PC analyses of these data and assess whether the two treatments separate on PC1/2. And if they don't separate, then they don't separate, and you have to go with these results.

      3b. Alternatively, they need to figure out the overall intensity distributions from the different runs (if that the main reason of markedly different counts) and adjust their thresholds for Fos-positive cell detection based on this. I would expect that the saline and HC groups should have similar levels of activation, so they could use these as the 'control' group to determine a Fos-positive intensity threshold that gets applied to the corresponding 'treatment' group.

      3c. If neither 3a nor 3b is an option then they need to show the outcomes of their analysis when using the untransformed data in the main figures (the untransformed data plots in their responses to reviewer are currently not in the main or supplementary figs) and discuss these as well.

      a. Thank you very much for your valuable suggestion. We conducted PCA analysis on the ISO and KET data after Z-scoring them with their respective control groups and did not find any significant separation.

      Author response image 9.

      As mentioned in our response to reviewer #1, we have reprocessed the raw data. Firstly, we divided the ISO and KET data by their respective control brain regions and then performed a logarithmic transformation to obtain the log relative c-Fos density. The purpose of this is to eliminate the impact of baseline differences and reduce variability. We then performed hierarchical clustering, and finally, we Z-scored the log relative c-Fos density data. The aim is to facilitate comparison of ISO and KET on the same data dimension (Figure 2 and 3).

      b. We appreciate your concerns regarding the detection thresholds for Fos-positive cells. The enclosed images, extracted from supplementary figures for Figures 4 and 5, demonstrate notable differences in c-Fos expression between saline and home cage groups in specific brain regions. These regions exhibit a discernible difference in staining intensity, with the saline group showing enhanced c-Fos expression in the PVH and PVT regions compared to the home cage group. An examination of supplementary figures for Figures 4 and 5 shows that c-Fos expression in the home cage group is consistently lower than in the saline group. This comparative analysis confirms that the discrepancies in c-Fos levels are not due to varying detection thresholds.

      Author response image 10.

      b. We have added the corresponding original data graphs to Supplementary Figures 2 and 3, and discussed the potential reasons for the significant differences between these groups in the discussion section (also shown below).

      Lines 308-317: "...Our findings indicate that c-Fos expression in the KET group is significantly elevated compared to the ISO group, and the saline group exhibits notably higher c-Fos expression than the home cage group, as seen in Supplementary Figures 2 and 3. Intraperitoneal saline injections in the saline group, despite pre-experiment acclimation with handling and injections for four days, may still evoke pain and stress responses in mice. Subtle yet measurable variations in brain states between the home cage and saline groups were observed, characterized by changes in normalized EEG delta/theta power (home cage: 0.05±0.09; saline: -0.03±0.11) and EMG power (home cage: -0.37±0.34; saline: 0.04±0.13), as shown in Figure 3—figure supplement 1. These changes suggest a relative increase in overall brain activity in the saline group compared to the home cage group, potentially contributing to the higher c-Fos expression.…”

    1. Author Response

      We thank the reviewers for their detailed and constructive criticisms of our work. They raise many important questions (such as the issue of defining context) that we have also been thinking about extensively and they provide new and insightful avenues that have the potential to meaningfully improve the manuscript. We also appreciate that they commented on the novelty and importance of this work. Going forward, we will address the methodological concerns raised as best as we can and thereby hope to make the evidence for our conclusion more compelling

    1. Author Response

      eLife assessment

      This study provides direct evidence showing that Kv1.8 channels underly several potassium currents in the two types of sensory hair cells found in the mouse vestibular system. This is an important finding because the nature of the channels underpinning the unusual potassium conductance gK,L in type I hair cells has been under scrutiny for many years. Although most of the experimental evidence is compelling and the analysis is rigorous, the evidence supporting some of the claims related to Kv1.4 channels is incomplete. The study will be of interest to cell and molecular biologists and auditory neuroscientists.

      We are thankful to the editor and reviewers for their thorough assessment of our work and insightful feedback. Our responses to the comments and suggestions are below.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors provide a thorough demonstration of the role that one particular type of voltage-gated potassium channel, Kv1.8, plays in a low voltage-activated conductance found in type I vestibular hair cells. Along the way, they find that this same channel protein appears to function in type II vestibular hair cells as well, contributing to other macroscopic conductances. Overall, Kv1.8 may provide especially low input resistance and short time constants to facilitate encoding of more rapid head movements in animals that have necks. Combination with other channel proteins, in different ratios, may contribute to the diversified excitability of vestibular hair cells.

      Strengths:

      The experiments are comprehensive and clearly described, both in the text and in the figures. Statistical analyses are provided throughout.

      Weaknesses:

      None.

      Reviewer #2 (Public Review):

      The focus of this manuscript was to investigate whether Kv1.8 channels, which have previously been suggested to be expressed in type I hair cells of the mammalian vestibular system, are responsible for the potassium conductance gK,L. This is an important study because gK,L is known to be crucial for the function of type I hair cells, but the channel identity has been a matter of debate for the past 20 years. The authors have addressed this research topic by primarily investigating the electrophysiological properties of the vestibular hair cells from Kv1.8 knockout mice. Interestingly, gK,L was completely abolished in Kv1.8-deficient mice, in agreement with the hypothesis put forward by the authors based on the literature. The surprising observation was that in the absence of Kv1.8 potassium channels, the outward potassium current in type II hair cells was also largely reduced. Type II hair cells express the largely inactivating potassium conductance gK,A, but not gK,L. The authors concluded that heteromultimerization of non-inactivating Kv1.8 and the inactivating Kv1.4 subunits could be responsible for the inactivating gK,A. Overall, the manuscript is very well written and most of the conclusions are supported by the experimental work. The figures are well described, and the statistical analysis is robust.

      My only comment relates to the statement regarding the results providing "evidence" that Kv1.4 form heteromultimers with Kv1.8 channels (see Discussion). The only data I can see from the results is that Kv1.4 channels are expressed in the membrane of type II hair cells, which is not sufficient evidence for the above claim. Is the distribution of Kv1.8 and Kv1.4 overlapping in type II hair cells? Have the authors attempted to perform some pharmacological studies on Kv1.4? For example, would gK,A be completely blocked by a Kv1.4 antagonist? Addressing at least some of these questions would strengthen your argument.

      Author response: With respect to the “evidence” for heteromultimerization of Kv1.4 and Kv1.8: We agree that there is not conclusive evidence but have pulled together reasons to suggest that the fast inactivation of Kv1.8-dependent gA in type II hair cells reflects a contribution from Kv1.4 subunits. The reasons we note are mostly from other sources: 1) Kv1.4 subunits are the only Kv1 alpha subunits known to make channels with intrinsic rapid inactivation (Bertoli et al., 1994); 2) Kv1.4 is highly expressed in type II hair cells, but not type I hair cells, in mouse utricle (McInturff et al., Biol. Open., 2018; Jan et al., Cell Reports, 2021; Orvis et al., Nat. Methods, 2021); 3) previous work from M. Correia and colleagues suggested Kv1.4 as the likely source of A-current in pigeon vestibular hair cells; 4) some rat type II hair cells show comparatively strong Kv1.4-like immunoreactivity (our Fig. 5). While we consider heteromultimerization of Kv1.4 and Kv1.8 alpha subunits a plausible explanation consistent with available data from different sources, we agree that the question is not at all settled, and indeed raise the possibility that KV beta subunits, which are also differentially expressed by type I and II hair cells, play a role. Experiments to definitively advance or refute this hypothesis are beyond the scope of this paper.

      Reviewer #3 (Public Review):

      Summary:

      This paper by Martin et al. describes the contribution of a Kv channel subunit (Kv1.8, KCNA10) to voltage-dependent K+ conductances and membrane properties of type I and type II hair cells of the mouse utricle. Previous work has documented striking differences in K+ conductances between vestibular hair cell types. In particular, amniote type I hair cells are known to express a non-typical low-voltage-activated K+ conductance (GK,L) whose molecular identity has been elusive. K+ conductances in hair cells from 3 different mouse genotypes (wildtype, Kv1.8 homozygous knockouts, and heterozygotes) are examined here and whole-cell patch-clamp recordings indicate a prominent role for Kv1.8 subunits in generating GK,L. Results also interestingly support a role for Kv1.8 subunits in type II hair cell K+ conductances; inactivating conductances in null mice are reduced in type II hair cells from striola and extrastriola regions of the utricle. Kv1.8 is therefore proposed to contribute as a pore-forming subunit for 3 different K+ conductances in vestibular hair cells. The impact of these conductances on membrane responses to current steps is studied in the current clamp. Pharmacological experiments use XE991 to block some residual Kv7-mediated current in both hair cell types, but no other pharmacological blockers are used. In addition, immunostaining data are presented and raise some questions about Kv7 and Kv1.8 channel localization. Overall, the data present compelling evidence that the removal of Kv1.8 produces profound changes in hair cell membrane conductances and sensory capabilities. These changes at hair cell level suggest vestibular function would be compromised and further assessment in terms of balance behavior in the different mice would be interesting.

      Strengths:

      This study provides strong evidence that Kv1.8 subunits are major contributors to the unusual K+ conductance in type I hair cells of the utricle. It also indicates that Kv1.8 subunits are important for type II hair cell K+ conductances because Kv1.8-/- mice lacked an inactivating A conductance and had reduced delayed rectifier conductance compared to controls. A comprehensive and careful analysis of biophysical profiles is presented of expressed K+ conductances in 3 different mouse genotypes. Voltage-dependent K+ currents are rigorously characterized at a range of different ages and their impact on membrane voltage responses to current input is studied. Some pharmacological experiments are performed in addition to immunostaining to bolster the conclusions from the biophysical studies. The paper has a significant impact in showing the role of Kv1.8 in determining utricular hair cell electrophysiological phenotypes.

      Weaknesses:

      1. From previous work it is known that GK,L in type I hair cells has unusual ion permeation and pharmacological properties that differ greatly from type II hair cell conductances. Notably GK,L is highly permeable to Cs+ as well as K+ ions and is slightly permeable to Na+. It is blocked by 4-aminopyridine and divalent cations (Ba2+, Ca2+, Ni2+), enhanced by external K+, and modulated by cyclic GMP. The question arises, if Kv1.8 is a major player and pore-forming subunit in type I and type II cells (and cochlear inner hair cells as shown by Dierich et al. 2020) how are subunits modified to produce channels with very different properties? A role for Kv1.4 channels (gA) is proposed in type II hair cells based on previous findings in bird hair cells and immunostaining for Kv1.4 channels in rat utricle presented here in Fig. 6. However, hair cell-specific partner interactions with Kv1.8 that result in GK,L in type I hair cells and Cs+ impermeable, inactivating currents in type II hair cells remain for the most part unexplored.

      Author response: Our results raise the question of how Kv1.8/Kcna10 is regulated to produce gK,L in type I hair cells, which has different properties from the Kv1.8 conductance expressed heterologously (Lang et al., Am. J. Physiol. Renal Physiol., 2000; Ranjan et al., Front. Cell. Neurosci., 2019; Dierich et al., Cell Reports, 2020) and the Kv1.8 conductance inferred in inner hair cells (Dierich et al., 2020). We lay out several possibilities in the Discussion, but testing these suggestions is beyond the scope of the present paper.

      The relatively high Cs+ permeability of gK,L (0.31 pCs/pK, Rüsch & Eatock, J. Neurophysiol., 1996; Rennie & Correia, J. Membr. Biol., 2000) suggests there is something different about the selectivity filter and pore region of gK,L relative to most Kv1 family members. Although the intrinsic Cs+ permeability of heterologously expressed Kv1.8 is not reported. While we note that the pore region in Kv1.8 differs from other Kv1 subunits by a single amino acid (a glycine instead of alanine at position 411 – placed by AlphaFold in the pore helix of hKCNA10, Jumper et al., Nature, 2021), the effect of this difference is not known. A separate study is needed to determine why gK,L has a high Cs+ permeability relative to other Kv channels.

      For type II hair cells, the Cs+ permeability of Kv currents has not been fully characterized. Internal Cs+ does appear to reduce outward current more effectively in type II hair cells (Lang & Correia, J. Neurophysiol., 1989; Sokolowski et al., Dev. Biol., 1993) than in type I hair cells (Rüsch & Eatock, J. Neurophysiol., 1996; Rennie & Correia, J. Membr. Biol., 2000).

      With respect to cochlear inner hair cells, note that the assignment of Kv1.8 by Dierich et al. (2021) to a delayed rectifier in cochlear inner hair cells (IHCs) was based on inference – that is, existing inner ear expression databases show that Kv1.8 is expressed in IHCs, and heterologous Kv1.8 channels have a current resembling that observed in IHCs after block of multiple other K channels. We agree with Dierich et al. that Kv1.8 is an attractive candidate for the residual conductance in cochlear IHCs based on comparison with its properties in heterologous expression data. Together their study and our study suggest that Kv1.8 takes on quite different voltage dependence depending on the hair cell environment, and it will be an interesting challenge to sort out the reasons.

      1. Data from patch-clamp and immunocytochemistry experiments are not in close alignment. XE991 (Kv7 channel blocker) decreases remaining K+ conductance in type I and type II hair cells from null mice supporting the presence of Kv7 channels in hair cells (Fig. 7). Also, Holt et al. (2007) previously showed inhibition of GK,L in type I hair cells (but not delayed rectifier conductance in type II hair cells) using a dominant negative construct of Kv7.4 channels. However, immunolabelling indicates Kv7.4 channels on the inner face of calyx terminals adjacent to hair cells (Fig. 5). Some reconciliation of these findings is needed.

      Author response: Our pharmacology with XE991 suggests a small but significant population of Kv7 channels in type I and II hair cells (Fig 7). With the immunogold technique, Kharkovets et al. (PNAS, 2000) and Hurley et al. (J. Neurosci., 2006) counted significant Kv7.4 particles in type I hair cells, although the particles occurred at much greater density in the postsynaptic calyx membrane facing the hair cell. These results lead us to propose that the Kv7 channel we identified pharmacologically includes the Kv7.4 subunit, possibly in combination with other Kv7 subunits (Lysakowski et al., J. Neurosci., 2011). By this argument, the absence of clear hair cell staining in the confocal images of Fig. 5A is likely to reflect differences in methods, which include the use of different mouse strains, different sensitivities of immunogold vs. confocal imaging, and different antibodies.

      Holt et al. (J. Neurosci., 2007) indeed saw inhibition of gK,L in hair cells grown in organotypic cultures of the neonatal mouse utricle after viral expression of a dominant negative Kv7.4 construct. However, other studies show that Kv7 antagonists do not block gK,L (Hurley et al., J. Neurosci., 2006), and the Jentsch group, which first proposed Kv7.4 as a likely candidate for gK,L (Kharkovets et al., PNAS, 2000), ultimately showed that knocking out Kv7.4 and Kv7.5 expression failed to eliminate gK,L (Spitzmaul et al., J. Biol. Chem., 2013). Together, these results suggest that in Holt et al. (2007), the inhibition of gK,L by transfection with the dominant negative KCNQ4 construct may have occurred through unintended interactions with native gK,L channels. The young age of the neonatal cultured and transfected utricles raises the possibility of a developmental effect – that functional Kv7 channels are needed for the developmental transition to a Kv1.8 conductance. Consistent with this idea is the observation that Kv7 current is present in neonatal hair cells, where it is a relatively large proportion of Kv current in type I HCs before they acquire gK,L (Hurley et al., J. Neurosci., 2006). Alternatively, the overexpression of nonfunctional Kv7.4 channels in virally-transfected hair cells may have inhibited or delayed gK,L acquisition through a more general effect on membrane proteins.

      1. Strong immunosignal appears in the cuticle plates of hair cells in addition to signal in basal regions of hair cells and supporting cells. Please provide a possible explanation for this.

      Author response: There is significant non-specific staining of apical cell surfaces and supporting cell membranes in addition to specific staining of hair cell basolateral membranes. We infer non-specific staining when immunolabeling is present in the knockout tissue, as it is for the apical surfaces and supporting cell membranes—compare Fig. 5B.3 (control tissue) with Fig. 5B.4 (Kv1.8 null mutant). Non-specific immunostaining can occur with polyclonal antibodies (specific to several epitopes) if the antibodies are not affinity-purified, but we used an affinity-purified antibody. The apical surfaces are reputed to be “sticky” (susceptible to non-specific staining) but the non-specific labeling in the basal parts of supporting cells is more puzzling. One possibility is that the Kv1.8 antibody weakly recognized closely related Kv1.1 channels, which are more strongly expressed in supporting cells than hair cells (Scheffer et al., J. Neurosci., 2015).

      1. A previous paper reported that a vestibular evoked potential was abnormal in Kv1.8-/- mice (Lee et al. 2013) as briefly mentioned (lines 94-95). It would be very interesting to know if any vestibular-associated behaviors and/or hearing loss were observed in the mice populations. If responses are compromised at the sensory hair cell level across different zones, degradation of balance function would be anticipated and should be elucidated.

      Author response: We agree; some of these questions are the subject of another paper in preparation.

    1. Author Response

      Reviewer 1:

      Comment 1.1: The distinction of PIGS from nearby OPA, which has also been implied in navigation and ego-motion, is not as clear as it could be.

      Response1.1: The main functional distinction between TOS/OPA and PIGS is that TOS/OPA responds preferentially to moving vs. stationary stimuli (even concentric rings), likely due to its overlap with the retinotopic motion-selective visual area V3A, for which this is a defining functional property (e.g. Tootell et al., 1997, J Neurosci). In comparison, PIGS does not show such a motion-selectivity. Instead, PIGS responds preferentially to more complex forms of motion within scenes. In this revision, we tried to better highlight this point in the Discussion (see also the response to the first comment from Reviewer #2).

      Reviewer 2:

      Comment 2.1: First, the scene-selective region identified appears to overlap with regions that have previously been identified in terms of their retinotopic properties. In particular, it is unclear whether this region overlaps with V7/IPS0 and/or IPS1. This is particularly important since prior work has shown that OPA often overlaps with v7/IPS0 (Silson et al, 2016, Journal of Vision). The findings would be much stronger if the authors could show how the location of PIGS relates to retinotopic areas (other than V6, which they do currently consider). I wonder if the authors have retinotopic mapping data for any of the participants included in this study. If not, the authors could always show atlas-based definitions of these areas (e.g. Wang et al, 2015, Cerebral Cortex).

      Response 2.1: We thank the reviewers for reminding us to more clearly delineate this issue of possible overlap, including the information provided by Silson et al, 2016. The issue of possible overlap between area TOS/OPA and the retinotopic visual areas, both in humans and non-human primates, was also clarified by our team in 2011 (Nasr et al., 2011). As you can see in the enclosed figure, and consistent with those previous studies, TOS/OPA overlaps with visual areas V3A/B and V7. Whereas PIGS is located more dorsally close to IPS2-4. As shown here, there is no overlap between PIGS and TOS/OPA and there is no overlap between PIGS and areas V3A/B and V7. To more directly address the reviewer’s concern, in the next revision, we will show the relative position of PIGS and the retinotopic areas (at least) in one individual subject.

      Author response image 1.

      The relative location of PIGS, TOS/OPA and the retinotopic visual areas. The left panel showed the result of high-resolution (7T; voxel size = 1 mm; no spatial smoothing) polar angle mapping in one individual. The right panel shows the location of scene-selective areas PIGS and TOS/OPA in the same subject (7T; voxel size = 1 mm; no spatial smoothing). While area TOS/OPA shows some overlap with the retinotopic visual areas V3A/B and V7, PIGS shows partial overlap with area IPS2-4. In both panels, the activity maps are overlaid on the subjects’ own reconstructed brain surface.

      Comment 2.2: Second, recent studies have reported a region anterior to OPA that seems to be involved in scene memory (Steel et al, 2021, Nature Communications; Steel et al, 2023, The Journal of Neuroscience; Steel et al, 2023, biorXiv). Is this region distinct from PIGS? Based on the figures in those papers, the scene memory-related region is inferior to V7/IPS0, so characterizing the location of PIGS to V7/IPS0 as suggested above would be very helpful here as well. If PIGS overlaps with either of V7/IPS0 or the scene memory-related area described by Steel and colleagues, then arguably it is not a newly defined region (although the characterization provided here still provides new information).

      Response 2.2: The lateral-place memory area (LPMA) is located on the lateral brain surface, anterior relative to the IPS (see Figure 1 from Steel et al., 2021 and Figure 3 from Steel et al., 2023). In contrast, PIGS is located on the posterior brain surface, also posterior relative to the IPS. In other words, they are located on two different sides of a major brain sulcus. In this revision we have clarified this point, including the citations by Steel and colleagues.

      Comments 2.3: Another reason that it would be helpful to relate PIGS to this scene memory area is that this scene memory area has been shown to have activity related to the amount of visuospatial context (Steel et al, 2023, The Journal of Neuroscience). The conditions used to show the sensitivity of PIGS to ego-motion also differ in the visuospatial context that can be accessed from the stimuli. Even if PIGS appears distinct from the scene memory area, the degree of visuospatial context is an alternative account of what might be represented in PIGS.

      Response 2.3: The reviewer raises an interesting point. One minor confusion is that we may be inadvertently referring to two slightly different types of “visuospatial context”. Specifically, the stimuli used in the ego-motion experiment here (i.e. coherently vs. incoherently changing scenes) represent the same scenes, and the only difference between the two conditions is the sequence of images across the experimental blocks. In that sense, the two experimental conditions may be considered to have the same visuospatial context. However, it could be also argued that the coherently changing scenes provide more information about the environmental layout. In that case, considering the previous reports that PPA/TPA and RSC/MPA may also be involved in layout encoding (Epstein and Kanwisher 1998; Wolbers et al. 2011), we expected to see more activity within those regions in response to coherently compared incoherently changing scenes. These issues are now more explicitly discussed in the revised article.

      Reviewer 3:

      Comment 3.1: There are few weaknesses in this work. If pressed, I might say that the stimuli depicting ego-motion do not, strictly speaking, depict motion, but only apparent motion between 2s apart photographs. However, this choice was made to equate frame rates and motion contrast between the 'ego-motion' and a control condition, which is a useful and valid approach to the problem. Some choices for visualization of the results might be made differently; for example, outlines of the regions might be shown in more plots for easier comparison of activation locations, but this is a minor issue.

      Response 3.1: We thank the reviewer for these constructive suggestions, and we agree with their comment that the ego-motion stimuli are not smooth, even though they were refreshed every 100 ms. However, the stimuli were nevertheless coherent enough to activate areas V6 and MT, two major areas known to respond preferentially to coherent compared to incoherent motion.

      Epstein, R., and N. Kanwisher. 1998. 'A cortical representation of the local visual environment', Nature, 392: 598-601.

      Wolbers, T., R. L. Klatzky, J. M. Loomis, M. G. Wutte, and N. A. Giudice. 2011. 'Modality-independent coding of spatial layout in the human brain', Curr Biol, 21: 984-9.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study presents valuable findings about synaptic connectivity among subsets of unipolar brush cells (UBCs), a specialized interneuron primarily located in the vestibular lobules of the cerebellar cortex. The evidence supporting the claims are interesting although incomplete in some areas. The work will be of interest to cerebellar neuroscientists as well as those focussed on synaptic properties and mechanisms. Although several compelling pieces of data were presented, substantial work remains to be conducted in order for the hypothesis and predictions of the manuscript to confirm how these factors play out in the actual brain circuit and how it would impact the processing of feedback or feedforward activity that would be required to promote behavior.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Hariani et al. presents experiments designed to improve our understanding of the connectivity and computational role of Unipolar Brush Cells (UBCs) within the cerebellar cortex, primarily lobes IX and X. The authors develop and cross several genetic lines of mice that express distinct fluorophores in subsets of UBCs, combined with immunocytochemistry that also distinguishes subtypes of UBCs, and they use confocal microscopy and electrophysiology to characterize the electrical and synaptic properties of subsets of so-labelled cells, and their synaptic connectivity within the cerebellar cortex. The authors then generate a computer model to test possible computational functions of such interconnected UBCs.

      Using these approaches, the authors report that:

      1. GRP-driven TDtomato is expressed exclusively in a subset (20%) of ON-UBCs, defined electrophysiologically (excited by mossy fiber afferent stimulation via activation of UBC AMPA and mGluR1 receptors) and immunocytochemically by their expression of mGluR1.

      2. UBCs ID'd/tagged by mCitrine expression in Brainbow mouse line P079 is expressed in a similar minority subset of OFF-UBCs defined electrophysiologically (inhibited by mossy fiber afferent stimulation via activation of UBC mGluR2 receptors) and immunocytochemically by their expression of Calretinin. However, such mCitrine expression was also detected in some mGluR1 positive UBCs, which may not have shown up electrophysiologically because of the weaker fluorophore expression without antibody amplification.

      3. Confocal analysis of crossed lines of mice (GRP X P079) stained with antibodies to mGluR1 and calretinin documented the existence of all possible permutations of interconnectivity between cells (ON-ON, ON-OFF, OFF-OFF, OFF-ON), but their overall abundance was low, and neither their absolute or relative abundance was quantified.

      4. A computational model (NEURON ) indicated that the presence of an intermediary UBC (in a polysynaptic circuit from MF to UBC to UBC) could prolong bursts (MF-ON-ON), prolong pauses (MF-ON-OFF), cause a delayed burst (MF-OFF-OFF), cause a delayed pause (MF-OFF-ON) relative to solely MF to UBC synapses which would simply exhibit long bursts (MF-ON) or long pauses (MF-OFF).

      The authors thus conclude that the pattern of interconnected UBCs provides an extended and more nuanced pattern of firing within the cerebellar cortex that could mediate longer lasting sensorimotor responses.

      The cerebellum's long known role in motor skills and reflexes, and associated disorders, combined with our nascent understanding of its role in cognitive, emotional, and appetitive processing, makes understanding its circuitry and processing functions of broad interest to the neuroscience and biomedical community. The focus on UBCs, which are largely restricted to vestibular lobes of the cerebellum reduces the breadth of likely interest somewhat. The overall design of specific experiments is rigorous and the use of fluorophore expressing mouse lines is creative. The data that is presented and the writing are clear. However, despite some additional analysis in response to the initial review, the overall experimental design still has issues that reduce overall interpretation (please see specific issues for details), which combined with a lack of thorough analysis of the experimental outcomes undermines the value of the NEURON model results and the advance in our understanding of cerebellar processing in situ (again, please see specific issues for details).

      Specific issues:

      1. All data gathered with inhibition blocked. All of the UBC response data (Fig. 1) was gathered in the presence of GABAAR and Glycine R blockers. While such an approach is appropriate generally for isolating glutamatergic synaptic currents, and specifically for examining and characterizing monosynaptic responses to single stimuli, it becomes problematic in the context of assaying synaptic and action potential response durations for long lasting responses, and in particular for trains of stimuli, when feed-forward and feed-back inhibition modulates responses to afferent stimulation. I.e. even for single MF stimuli, given the >500ms duration of UBC synaptic currents, there is plenty of time for feedback inhibition from Golgi cells (or feedforward, from MF to Golgi cell excitation) to interrupt AP firing driven by the direct glutamatergic synaptic excitation. This issue is compounded further for all of the experiments examining trains of MF stimuli. Beyond the impact of feedback inhibition on the AP firing of any given UBC, it would also obviously reduce/alter/interrupt that UBC's synaptic drive of downstream UBCs. This issue fundamentally undermines our ability to interpret the simulation data of Vm and AP firing of both the modeled intermediate and downstream UBC, in terms of applying it to possible cerebellar cortical processing in situ.

      The goal of Figure 1 was to determine the cell types of labeled UBCs in transgenic mouse lines, which is determined entirely by their synaptic responses to glutamate (Borges-Merjane and Trussell, 2015). Thus, blocking inhibition was essential to produce clear results in the characterization of GRP and P079 UBCs. While GABAergic/glycinergic feedforward and feedback inhibition is certainly important in the intact circuit, it was not our intention, nor was it possible, to study its contribution in the present study. Leaving inhibition unblocked does not lead to a physiologically realistic stimulation pattern in acute brain slices, because electrical stimulation produces synchronous excitation and inhibition by directly exciting Golgi cells, rather than their synaptic inputs. The main inhibition that UBCs receive that are crucial to determining burst or pause durations is not via GABA/glycine, but instead through mGluR2, which lasts for 100-1000s of milliseconds. The main excitation that drives UBC firing is mGluR1 and AMPA, which both last 100-1000s of milliseconds. Thus, these large conductances are unlikely to be significantly shaped by 1-10 ms IPSCs from feedforward and feedback GABA/glycine inhibition. Recent studies that examined the duration of bursting or pausing in UBCs had inhibition blocked in their experiments, presumably for the reasons outlined above (Guo et al., 2021; Huson et al., 2023).

      Below is an example showing the synaptic currents and firing patterns in an ON UBC before and after blocking inhibition. The GABA/glycinergic inhibition is fast, occurs soon after the stimuli and has little to no effect on the slow inward current that develops after the end of stimulation, which is what drives firing for 100s of milliseconds.

      Author response image 1.

      Example showing small effect of GABAergic and glycinergic inhibition on excitatory currents and burst duration. A) Excitatory postsynaptic currents in response to train of 10 presynaptic stimuli at 50 Hz before (black) and after (Grey) blocking GABA and glycine receptors. The slow inward current that occurs at the end of stimulation is little affected. B) Expanded view of the synaptic currents evoked during the train of stimuli. GABA/glycine receptors mediate the fast outward currents that occur immediately after the first couple stimuli. C) Three examples of the bursts caused by the 50 Hz stimulation in the same cell without blocking GABA and glycine receptors. D) Three examples in the same cell after blocking GABA and glycine receptors.

      The authors' response to the initial concern is (to paraphrase), "its not possible to do and its not important", neither of which are soundly justified.

      As stated in the original review, it is fully understandable and appropriate to use GABAAR/GlycineR antagonists to isolate glutamatergic currents, to characterize their conductance kinetics. That was not the issue raised. The issue raised was that then using only such information to generate a model of in situ behavior becomes problematic, given that feedback and lateral inhibition will sculpt action potential output, which of course will then fundamentally shape their synaptic drive of secondary UBCs, which will be further sculpted by their own inhibitory inputs. This issue undermines interpretation of the NEURON model.

      The argument that taking inhibition into account is not possible because of assumed or possible direct electrical excitation of Golgi cells is confusing for two interacting reasons. First, one can certainly stimulate the mossy fiber bundle to get afferent excitation of UBCs (and polysynaptic feedback/lateral inhibitory inputs) without directly stimulating the Golgi cells that innervate any recorded UBC. Yes, one might be stimulating some Golgi cells near the stimulating electrode, but one can position the stimulating electrode far enough down the white matter track (away from the recorded UBC), such that mossy fiber inputs to the recorded UBC can be stimulated without affecting Golgi cells near or synaptically connected to the recorded UBC. Moreover, if the argument were true, then presumably the stimulation protocol would be just as likely to directly stimulate neighboring UBCs, which then drove the recorded UBC's responses. Thus, it is both doable and should be ensured that stimulation of the white matter is distant enough to not be directly activating relevant, connected neurons within the granule cell layer.

      Finally, the authors present three examples of UBC recordings with and without inhibitory inputs blocked, and state "Thus, these large conductances are unlikely to be significantly shaped by 1-10 ms IPSCs from feedforward and feedback GABA/glycine inhibition" and "GABA/glycinergic inhibition...has little to no effect on the slow inward current that develops after the end of stimulation". This response reflects on original concerns about lack of quantification or consideration of important parameters. In particular, while the traces with and without inhibition are qualitatively similar, quantitative considerations indicate otherwise. First, unquantified examples are not adequate to drive conclusions. Regardless, the main issue (how inhibition affects actual responses in situ) is actually highlighted by the authors current clamp recordings of UBC responses, before and after blocking inhibition. The output response is dramatically different, both at early and late time points, when inhibition is blocked. Again, a lack of quantification (of adequate n's) makes it hard to know exactly how important, but quick "eye ball" estimates of impact include: 1) a switch from only low frequency APs initially (without inhibition blocked) to immediate burst of high frequency APs (high enough to not discern individual APs with given figure resolution) when inhibition is blocked, 2) Slow rising to a peak EPSP, followed by symmetrical return to baseline (without inhibition blocked) versus immediate rise to peak, followed by prolonged decay to baseline (with inhibition blocked), 3) substantially shorter duration (~34% shorter) secondary high frequency burst (individual APs not discernible) of APs (with inhibition blocked versus without inhibition blocked), and 4) substantial reduction in number of long delayed APs (with inhibition blocked versus without inhibition blocked). Thus, clearly, feedback/lateral inhibition is actually sculpting AP output at all phases of the UBC response to trains of afferent stimulations. Importantly, the single voltage clamp trace showing little impact of transient IPSCs on the slow EPSC do not take into account likely IPSC influences on voltage-activated conductances that would not occur in voltage-clamp recordings but would be free to manifest in current clamp, and thereby influence AP output, as observed.

      So again, our ability to understand how interconnected UBCs behave in the intact system is undermined by the lack of consideration and quantification of the impact of inhibition, and it not being incorporated into the model. At the very least a strong proviso about lack of inclusion of such information, given the authors' data showing its importance in the few examples shown, should be added to the discussion.

      Thank you for this substantive explanation. Your points are well described and we agree that the single experiment shown is not strong evidence for a lack of importance of Golgi cell inhibition, especially on the temporal dynamics of spiking. Previous work has clearly shown that Golgi cells have several important roles in shaping the activity of the granular layer, including affecting the temporal dynamics of granule cell spikes. However, the work presented here focuses on the feedforward circuitry of UBCs and the large inward and large outward glutamatergic currents that drive spiking or pausing for 100s of milliseconds. Our model does not focus on the aspects that are most sensitive to Golgi cell inhibition, including timing of the first spikes in the UBC’s response. Nor does our model focus on short term plasticity, which we thought was reasonable because the slow currents in UBCs are quite insensitive to the temporal characteristics of glutamate release (See the example in the previous rebuttal). Our model does not include long term plasticity, which is also affected by Golgi cells. For these reasons we agree that the model presented does not explain how feedforward UBC circuits might “play out in the actual brain circuit and how it would impact the processing of feedback or feedforward activity that would be required to promote behavior.” We have included a new paragraph in the discussion clarifying the limitations of this study and the model, reproduced below.

      "Limitations of the model

      Here we addressed how feedforward glutamatergic excitation and inhibition is transformed from one UBC to the next depending on their subtype. The model focuses on AMPA receptor mediated excitation and mGluR2 mediated inhibition. One limitation of the model is that it does not consider feedforward and lateral inhibition from Golgi cells, which shape the spiking of UBCs in response to afferent stimulation. Golgi cells receive mossy fiber input and inhibit UBCs through their corelease of GABA and glycine (Dugue et al., 2005; Rousseau et al., 2012). Golgi cells control the temporal dynamics of the firing of granule cells as well as their gain (Rossi et al., 2003; Kanichay and Silver, 2008) and are critical to larger scale dynamics of the cerebellar cortical network (D‘Angelo, 2008). Purkinje cells provide additional inhibition to ON UBCs that could influence how UBC circuits transform signals (Guo et al., 2016). A more complex model that implements Golgi cells and other critical circuit elements will be needed to investigate the role of feedforward UBC circuits in cerebellar network dynamics and motor behaviors in vivo."

      1. No consideration for involvement of polysynaptic UBCs driving UBC responses to MF stimulation in electrophysiology experiments. Given the established existence (in this manuscript and Dino et al. 2000 Neurosci, Dino et al. 2000 ProgBrainRes, Nunzi and Mugnaini 2000 JCompNeurol, Nunzi et al. 2001 JCompNeurol) of polysynaptic connections from MFs to UBCs to UBCs, the MF evoked UBC responses established in this manuscript, especially responses to trains of stimuli could be mediated by direct MF inputs, or to polysynaptic UBC inputs, or possibly both (to my awareness not established either way). Thus the response durations could already include extension of duration by polysynaptic inputs, and so would overestimate the duration of monosynaptic inputs, and thus polysynaptic amplification/modulation, observed in the NEURON model.

      We are confident that the synaptic responses shown are monosynaptic for several reasons. UBCs receive a single mossy fiber input on their dendritic brush, and thus if our stimulation produces a reliable, short-latency response consistent with a monosynaptic input, then there is not likely to be a disynaptic input, because the main input is accounted for by the monosynaptic response. In all cells included in our data set, the fast AMPA receptor-mediated currents always occurred with short latency (1.24 ± 0.29 ms; mean ± SD; n = 13), high reliability (no failures to produce an EPSC in any of the 13 GRP UBCs in this data set), and low jitter (SD of latency; 0.074 ± 0.046 ms; mean ± SD; n = 13). These measurements have been added to the results section.

      In some rare cases, we did observe disynaptic currents, which were easily distinguishable because a single electrical stimulation produced a burst of EPSCs at variable latencies. Please see example below. These cases of disynaptic input, which have been reported by others (Diño et al., 2000; Nunzi and Mugnaini, 2000; van Dorp and De Zeeuw, 2015) support the conclusion that UBCs receive input from other UBCs.

      Author response image 2.

      Example of GRP UBC with disynaptic input. Three examples of the effect of a single presynaptic stimulus (triangle) in a GRP UBC with presumed disynaptic input. Note the variable latency of the first evoked EPSC, bursts of EPSCs, and spontaneous EPSCs.

      Author response: "UBCs receive a single mossy fiber input on their dendritic brush, and thus if our stimulation produces a reliable, short-latency response consistent with a monosynaptic input, then there is not likely to be a disynaptic input."

      This statement is not congruent with the literature, with early work by Mugnaini and colleagues (Mugnaini et al. 1994 Synapse; Mugnaini and Flores 1994 J. Comp. Neurol.) indicating that UBCs are innervated by 1-2 mossy fibers, which are as likely other UBC terminals as MFs. This leaves open the possibility that so called monosynaptic responses do, as originally suggested, already include polysynaptic feedforward amplification of duration. While the authors also indicate that isolated disynaptic currents can be observed when they occur in isolation, a careful examination and objective documentation of "monosynaptic" responses would address this issue. Presumably, if potential disynaptic UBC inputs occur during a monosynaptic MF response, it would be detected as an abrupt biphasic inward/outward current, due to additional AMPA receptor activation but further desensitization of those already active (as observed by Kinney et al. 1997 J. Neurophysiol: "The delivery of a second MF stimulus at the peak of the slow EPSC evoked a fast EPSC of reduced amplitude followed by an undershoot of the subsequent slow current"). If such polysynaptic inputs are truly absent and are "rare" in isolation, some estimation of how common or not such synaptic amplification is, would improve our understanding of the overall significance of these inputs.

      We are confident that these currents are monosynaptic, because, as suggested, we carefully analyzed the latency, jitter and reliability, which was added to the previous revision. The latency and jitter are strong (quantitative) evidence that the first EPSC evoked was monosynaptic. While some UBCs have been reported to have multiple brushes, or brushes that branch and may contact multiple mossy fibers, or receive synaptic input onto their somas, these cases are rare in our experience in this age of mouse and there is no evidence for them in this dataset. For every trace we made a careful examination and documented that no delayed EPSCs were present. The presence of delayed EPSCs (or ‘abrupt biphasic inward/outward currents’ as described in Kinney et al 1997) would indeed suggest the presence of disynaptic activity or multiple inputs to the UBC, but these would be easily identified, even during a stimulation train. For these reasons we feel that we have established that polysynaptic feedforward amplification of duration is not present

      We agree that the monosynaptic responses could be due to the stimulation of UBC axons. However, the absence of delayed EPSCs again suggests that if stimulation of a presynaptic UBC axon was producing the currents in the recorded UBC, then the axon was severed from the soma and AIS, because this region is necessary for the cell to produce more than a single spike per stimulation. We added a sentence describing the potential for the monosynaptic EPSCs to be due to the stimulation of presynaptic UBC axons.

      Your point is well taken that a discussion of how common or rare these UBC to UBC connections is necessary to more clearly explain how we interpret their significance and we have expanded the paragraph in the discussion that does so. Thank you for this suggestion.

      1. Lack of quantification of subtypes of UBC interconnectivity. Given that it is already established that UBCs synapse onto other UBCs (see refs above), the main potential advance of this manuscript in terms of connectivity is the establishment and quantification of ON-ON, ON-OFF, OFF-ON, and OFF-OFF subtypes of UBC interconnections. But, the authors only establish that each type exists, showing specific examples, but no quantification of the absolute or relative density was provided, and the authors' unquantified wording explicitly or implicitly states that they are not common. This lack of quantification and likely small number makes it difficult to know how important or what impact such synapses have on cerebellar processing, in the model and in situ.

      As noted by the reviewer, the connections between UBCs were rare to observe. We decided against attempting to quantify the absolute or relative density of connections for several reasons. A major reason for rare observations of anatomical connections between UBCs is likely due to the sparse labeling. First, the GRP mouse line only labels 20% of ON UBCs and we are unable to test whether postsynaptic connectivity of GRP ON UBCs is the same as that of the rest of the population of ON UBCs that are not labeled in the GRP mouse line. Second, the Brainbow reporter mouse only labels a small population of Cre expressing cells for unknown reasons. Third, the Brainbow reporter expression was so low that antibody amplification was necessary, which then limited the labeled cells to those close to the surface of the brain slices, because of known antibody penetration difficulties. Therefore, we refrained from estimating the density of these connections, because each of these variables reduced the labeling to unknown degrees and we reasoned that extrapolating our rare observations to the total population would be inaccurate.

      A paper that investigated UBC connectivity using organotypic slice cultures from P8 mice suggests that 2/3 of the UBC population receives UBC input, based on the observation that 2/3 of the mossy fibers did not degenerate as would be expected after 2 days in vitro if they were severed from a distant cell body (Nunzi and Mugnaini, 2000). It remains to be seen if this high proportion is due to the young age of these mice or is also the case in adult mice. Even if these connections are indeed rare, they are expected to have profound effects on the circuit, as each UBC has multiple mossy fiber terminals (Berthie and Axelrad, 1994), and mossy fiber terminals are estimated to contact 40 granule cells each (Jakab and Hamori, 1988). We have added a comment regarding this point to the discussion.

      To address this issue, the authors added the following text to the discussion section: "We did not estimate the density of these UBC to UBC connections, because the sparseness of labeling using these approaches made an accurate calculation impossible. Previous work using organotypic slice cultures from P8 mice estimated that 2/3 of the UBC population receives input from other UBCs (Nunzi & Mugnaini, 2000), although it is unclear whether this is the case in older mice."

      While accurate, the addition doesn't really address the situation, which is that apparently the reported connections are rare. Adding the information about 2/3 of UBCs having UBC inputs in culture, implies the opposite might be true (i.e. that they might be quite common), which is in contrast to the authors' data, so should be reworded for clarity, which should also incorporate the considerations covered in point #2 above. I.e. if the authors do establish that none of their recordings have polysynaptic inputs, and if they determine that the number of cells that showed isolated di-synaptic inputs is indeed rare, then it suggests that these specific polysynaptic connections are in fact rare.

      Thank you for pointing this out. We agree that adding this information is somewhat contradictory to our results and we have added more to this section in the discussion, provided below.

      Anatomically identifiable connections between UBCs were not present in all brain slices and finding them required a careful search. UBC labeling was sparse due to the highly specific genetic labeling techniques and further sparsification by the Brainbow reporter, which made it impossible to estimate the density of these UBC to UBC connections. Electrophysiological evidences suggest that UBC to UBC connections are not common, because spontaneous EPSCs that would indicate a spontaneously firing presynaptic UBC are only rarely observed in UBCs recorded in acute brain slices. In an analysis of feedforward excitation of granule layer neurons, only 4 out of 140 UBCs had this indirect evidence of a firing presynaptic UBC (van Dorp and De Zeeuw, 2015), which suggests that UBC to UBC connections may be rare. On the other hand, previous work using organotypic slice cultures from P8 mice estimated that 2/3 of the UBC population receives input from other UBCs (Nunzi & Mugnaini, 2000). This suggests a much higher density of UBC to UBC connections, but could be due to the young age of the brains used, which is before UBCs have matured (Morin et al., 2001), and also due to increased collateral sprouting that can occur in culture (Jaeger et al., 1988). Another study imaged 2-4 week old rat cerebellar slices at an electron microscopic level and found that 4 out of 14 UBC axon terminals contacted UBC brushes (Diño et al., 2000). Future work is necessary to accurately estimate the density and impact of these feedforward UBC circuits.

      1. Lack of critical parameters in NEURON model.

      A) The model uses # of molecules of glutamate released as the presumed quantal content, and this factor is constant.

      However, no consideration of changes in # of vesicles released from single versus trains of APs from MFs or UBCs is included. At most simple synapses, two sequential APs alters release probability, either up or down, and release probability changes dynamically with trains of APs. It is therefore reasonable to imagine UBC axon release probability is at least as complicated, and given the large surface area of contact between two UBCs, the number of vesicles released for any given AP is also likely more complex.

      B) the model does not include desensitization of AMPA receptors, which in the case of UBCs can paradoxically reduce response magnitude as vesicle release and consequent glutamate concentration in the cleft increases (Linney et al. 1997 JNeurophysiol, Lu et al. 2017 Neuron, Balmer et al. 2021 eLIFE), as would occur with trains of stimuli at MF to ON-UBCs.

      A) The model produces synaptic AMPA and mGluR2 currents that reproduce those we recorded in vitro. We did not find it necessary to implement changes in glutamate release during a train as the model was fit to UBC data with the assumption that the glutamate transient did not change during the train. If there is a change in neurotransmitter release during a train, it is therefore built into the model, which has the advantage of reducing its complexity. UBCs are a special case where the postsynaptic currents are mediated mostly by the total amount of transmitter released. Most of the evoked current occurs tens to hundreds of milliseconds after neurotransmitter release and is therefore much more sensitive to total release and less sensitive to how it is released during the train. The figure below shows the effect of reducing the amount of glutamate released by 10% on each stimulus in the model. Despite a significant change in the pattern of neurotransmitter release, as well as a reduction in the total amount of glutamate, the slow EPSC still decays over the course of hundreds of milliseconds.

      B) The detailed kinetic AMPA receptor model used here accurately reproduces desensitization, which in fact mediates that the slow ON UBC current. This AMPA receptor is a 13-state model, including 4 open states with 1-4 glutamates bound, 4 closed states with 1-4 glutamates bound, 4 desensitized states with 1-4 glutamates bound, and 5 closed states with 0-4 glutamates bound. The forward and reverse rates between different states in the model were fit to AMPA receptor currents recorded from dissociated UBCs and they accurately reproduced the ON UBC currents evoked by synaptic stimulation in our previous work (Balmer et al., 2021).

      Author response image 3.

      Effect of short-term depression of neurotransmitter release. A) The top trace shows the glutamate transient that drives the AMPA receptor model used in our study. No change in release is implemented, although the slow tail of the transient summates during the train. The bottom trace shows the modeled AMPA receptor mediated current. B) In this model the amount of glutamate released on each stimulus is reduced by 10%. The duration of the slow AMPA current is similar, despite a profound change in the pattern of neurotransmitter exposure.

      While the authors have not added the suggested additional parameters, their clarifications regarding the implications of existing parameters, and demonstration of reasonable fits to experimental data, and lack of substantial effect of simulating reduced vesicle release probability,

      1. Lack of quantification of various electrophysiological responses. UBCs are defined (ON or OFF) based on inward or outward synaptic response, but no information is provided about the range of the key parameter of duration across cells, which seems most critical to the current considerations. There is a similar lack of quantification across cells of AP duration in response to stimulation or current injections, or during baseline. The latter lack is particularly problematic because in agreement with previous publications, the raw data in Fig. 1 shows ON UBCs as quiescent until MF stimulation and OFF UBCs firing spontaneously until MF stimulation, but, for example, at least one ON UBC in the NEURON model is firing spontaneously until synaptically activated by an OFF UBC (Fig. 11A), and an OFF UBC is silent until stimulated by a presynaptic OFF UBC (Fig. 11C). This may be expected/explainable theoretically, but then such cells should be observed in the raw data.

      To address this reasonable concern of a general lack of quantification of electrophysiological responses we have added data characterizing the slow inward and outward currents evoked by synaptic stimulation in GRP and P079 UBCs in the results section and in new panels in Figure 1. We report the action potential pause lengths in P079 UBCs and burst lengths in ON UBCs in the results section. However, we favor the duration of the currents to the length of burst and pause, because the currents do not depend on a stable resting membrane potential, which is itself difficult to determine in intracellular recordings of these small cells. In a series of recent publications that focused on UBC firing, the authors argue that cell-attached recordings are necessary to determine accurately the burst and pause lengths, as well as spontaneous firing rates (Guo et al., 2021; Huson et al., 2023). (The trade-off of these extracellular recordings is that the monosynaptic nature of the input is nearly impossible to confirm.) Spontaneous firing rates were variable within both GRP and P079 UBCs from silent to firing regularly or in bursts, as previously reported (Kim et al., 2012; van Dorp and De Zeeuw, 2015). For clarity, we chose to model the GRP UBCs as silent unless receiving synaptic input and P079 UBCs as active unless receiving synaptic input. As the reviewer suggests, we have observed UBCs firing in the patterns similar to those shown in the model UBCs having input from spontaneous presynaptic UBCs. Below are some examples of spontaneous EPSCs and IPSCs in UBCs that suggest the presence of a presynaptic UBC.

      Author response image 4.

      Examples of UBCs that receive spontaneous input. A) Three ON UBCs that had spontaneous EPSCs, suggesting the presence of an active presynaptic UBC. B) Two OFF UBCs that had spontaneous outward currents.

      The authors have added additional analysis and discussion, which adequately addresses this concern.

      Reviewer #2 (Public Review):

      In this paper, the authors presented a compelling rationale for investigating the role of UBCs in prolonging and diversifying signals. Based on the two types of UBCs known as ON and OFF UBC subtypes, they have highlighted the existing gaps in understanding UBCs connectivity and the need to investigate whether UBCs target UBCs of the same subtype, different subtypes, or both. The importance of this knowledge is for understanding how sensory signals are extended and diversified in the granule cell layer.

      The authors designed very interesting approaches to study UBCs connectivity by utilizing transgenic mice expressing GFP and RFP in UBCs, Brainbow approach, immunohistochemical and electrophysiological analysis, and computational models to understand how the feed-forward circuits of interconnected UBCs transform their inputs.

      This study provided evidence for the existence of distinct ON and OFF UBC subtypes based on their electrophysiological properties, anatomical characteristics, and expression patterns of mGluR1 and calretinin in the cerebellum. The findings support the classification of GRP UBCs as ON UBCs and P079 UBCs as OFF UBCs and suggest the presence of synaptic connections between the ON and OFF UBC subtypes. In addition, they found that GRP and P079 UBCs form parallel and convergent pathways and have different membrane capacitance and excitability. Furthermore, they showed that UBCs of the same subtype provide input to one another and modify the input to granule cells, which could provide a circuit mechanism to diversify and extend the pattern of spiking produced by mossy fiber input. Accordingly, they suggested that these transformations could provide a circuit mechanism for maintaining a sensory representation of movement for seconds.

      Overall, the article is well written in a sound detailed format, very interesting with excellent discovery and suggested model.

      I believe the authors have provided appropriate responses and have consequently revised the manuscript in a convincing manner. Although I am not an expert in physiology, I find the explanations and clarifications to be acceptable.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      1. The name of the new method "inter-haplotype distance" is more confusing than helpful, as the haplotype information is not critical for implementing this method. First, the mutation spectrum is aggregated genome-wide regardless of the haplotypes where the mutations are found. Second, the only critical haplotype information is that at the focal site (i.e., the locus that is tested for association): individuals are aggregated together when they belong to the same "haplotype group" at the focal site. However, for the classification step, haplotype information is not really necessary: individuals can be grouped based on their genotypes at the given locus (e.g., AA vs AB). As the authors mentioned, this method can be potentially applied to other mutation datasets, where haplotype information may well be unavailable. I hope the authors can reconsider the name and remove the term "haplotype" (perhaps something like "inter-genotype distance"?) to avoid giving the wrong impression that haplotype information is critical for applying this method.

      We appreciate the reviewer's concern about the name of our method. The reviewer is correct that haplotype information is not critical for our method to work, and as a result we've decided to simply rename the approach to "aggregate mutation spectrum distance" (abbreviated AMSD). For simplicity, we refer to the method as IHD throughout our responses to reviewers, but the revised manuscript now refers to AMSD.

      1. The biggest advantage of the IHD method over QTL mapping is alleviation of the multiple testing burden, as one comparison tests for any changes in the mutation spectrum, including simultaneous, small changes in the relative abundance of multiple mutation types. Based on this, the authors claim that IHD is more powerful to detect a mutator allele that affects multiple mutation types. Although logically plausible, it is unclear under what quantitative conditions IHD can actually have greater power over QTL. It will be helpful to support this claim by providing some simulation results.

      This comment prompted us to do a more detailed comparison of IHD vs. QTL power under conditions that are more similar to those observed in the BXD cohort. While preparing the original manuscript, we assumed that IHD might have greater power than QTL mapping in a population like the BXDs because some recombinant inbred lines have accumulated many more germline mutations than others (see Figure 1 in Sasani et al. 2022, Nature). In a quantitative trait locus scan (say, for the fraction of C>A mutations in each line) each BXD's mutation data would be weighted equally, even if a variable number of mutations was used to generate the phenotype point estimate in each line.

      To address this, we performed a new series of simulations in which the average number of mutations per haplotype was allowed to vary. At the low end, some BXDs accumulated as few as 100 total germline mutations, while others have accumulated as many as 2,000. Thus, instead of simulating a mean number of mutations on each simulated haplotype, we allowed the mean number of mutations per haplotype to vary from N to 20N. By simulating a variable count of mutations on each haplotype, we could more easily test the benefits of comparing aggregate, rather than individual, mutation spectra between BXDs.

      In these updated simulations, we find that IHD routinely outperforms QTL mapping under a range of parameter choices (see Author Response image 1). Since IHD aggregates the mutation spectra of all haplotypes with either B or D alleles at each locus in the genome, the method is much less sensitive to individual haplotypes with low mutation counts. We include a mention of these updated simulations on lines 135-138 and describe the updated simulations in greater detail in the Materials and Methods (lines 705-715).

      Author response image 1.

      Power of IHD and QTL mapping on simulated haplotypes with variable counts of mutations. We simulated germline mutations on the specified number of haplotypes (as described in the manuscript) but allowed the total number of mutations per haplotype to vary by a factor of 20.

      1. The flip side of this advantage of IHD is that, when a significant association is detected, it is not immediately clear which mutation type is driving the signal. Related to this, it is unclear how the authors reached the point that "...the C>A mutator phenotype associated with the locus on chromosome 6", when they only detected significant IHD signal at rs46276051 (on Chr6), when conditioning on D genotypes at the rs27509845 (on Chr4) and no significant signal for any 1-mer mutation type by traditional mapping. The authors need to explain how they deduced that C>A mutation is the major source of the signal. In addition, beyond C>A mutations, can mutation types other than C>A contribute to the IHD signal at rs46276051? More generally, I hope the authors can provide some guidelines on how to narrow a significant IHD signal to specific candidate mutation type(s) affected, which will make the method more useful to other researchers.

      We thank the reviewer for pointing out this gap in our logic. We omitted specific instructions for narrowing down an IHD signal to specific mutation type(s) for a few reasons. First, this can be addressed using mutational signature analysis methods that are in widespread use. For example, upon identifying one or more candidate mutator loci, we can enter the mutation spectra of samples with each possible mutator genotype into a program (e.g., SigProfilerExtractor) to determine which combinations of mutation types occur proportionally more often in the genomes that harbor mutators (see Figure 3c in our manuscript). A second approach for narrowing down an IHD signal, highlighted in Figure 3a (and now described in the text of the Results section at lines 256-261), is to simply test which mutation type proportion(s) differ significantly between groups of samples with and without a candidate mutator (for example, with a Chi-square test of independence for each mutation type).

      Although this second approach incurs a multiple testing burden, the burden is offset somewhat by using IHD to identify mutator loci, rather than performing association tests for every possible mutation type to begin with. Although Figure 3a only shows the significant difference in C>A fraction among BXDs with different mutator locus genotypes, Figure 3-figure supplement 1 shows the complete set of 1-mer spectrum comparisons. It is possible that this second approach would not prove very useful in the case of a mutator with a “flat” signature (i.e., a mutator that slightly perturbs the rates of many different mutation types), but in our case it clearly shows which mutation type is affected.

      1. To account for differential relatedness between the inbred lines, the authors regressed the cosine distance between the two aggregate mutation spectra on the genome-wide genetic similarity and took the residual as the adjusted test metric. What is the value of the slope from this regression? If significantly non-zero, this would support a polygenic architecture of the mutation spectrum phenotype, which could be interesting. If not, is this adjustment really necessary? In addition, is the intercept assumed to be zero for this regression, and does such an assumption matter? I would appreciate seeing a supplemental figure on this regression.

      The reviewer raises a good question. We find that the slope of the "distance vs. genetic similarity" regression is significantly non-zero, though the slope estimate itself is small. A plot of cosine distance vs. genome-wide genetic similarity (using all BXDs) is shown below in Author response image 2:

      Author response image 2.

      Relationship between cosine distance and genetic similarity in the BXDs. As described in the Materials and Methods, we computed two values at each marker in the BXDs: 1) the cosine distance between the aggregate mutation spectra of BXDs with either B or D genotypes at the marker, and 2) the correlation between genome-wide D allele frequencies in BXDs with either B or D genotypes at the marker. We then regressed these two values across all genome-wide markers.

      This result indicates that if two groups of BXDs (one with D genotypes and one with B genotypes at a given locus) are more genetically similar, their mutation spectra are also more similar. Since the regression slope estimate is significantly non-zero (p < 2.2e-16), we believe that it's still worth using residuals as opposed to raw cosine distance values. This result also suggests that there may be a polygenic effect on the mutation spectrum in the BXDs.

      We have also generated a plot showing the cosine distance between the mutation spectra of every possible pair of BXDs, regressed against the genetic similarity between each of those pairs (Author Response image 3). Here, the potential polygenic effects on mutation spectra similarity are perhaps more obvious.

      Author response image 3.

      Pairwise cosine distance between BXD mutation spectra as a function of genetic similarity. We computed two values for every possible pair of n = 117 BXDs: 1) the cosine distance between the samples' individual 1-mer mutation spectra and 2) the correlation coefficient between the samples' genome-wide counts of D alleles.

      Private Comments

      1. It will also be useful to see how the power of IHD and QTL mapping depend on the allele frequency of the mutator allele and the sample size, as mutator alleles are likely rare or semi-rare in natural populations (such as the human de novo mutation dataset that the authors mentioned).

      This is another good suggestion. In general, we'd expect the power of both IHD and QTL mapping to decrease as a function of mutator allele frequency. At the same time, we note that the power of these scans should mostly depend on the absolute number of carriers of the mutator allele and less on its frequency. In the BXD mouse study design, we observe high frequency mutators but also a relatively small sample size of just over 100 individuals. In natural human populations, mutator frequencies might be orders of magnitude smaller, but sample sizes may be orders of magnitude larger, especially as new cohorts of human genomes are routinely being sequenced. So, we expect to have similar power to detect a mutator segregating at, say, 0.5% frequency in a cohort of 20,000 individuals, as we would to detect a mutator segregating at 50% frequency in a dataset of 200 individuals.

      To more formally address the reviewer's concern, we performed a series of simulations in which we simulated a population of 100 haplotypes. We assigned the same average number of mutations to each haplotype but allowed the allele frequency of the mutator allele to vary between 0.1, 0.25, and 0.5. The results of these simulations are shown in Author response image 4 and reveal that AMSD tends to have greater power than QTL mapping at lower mutator allele frequencies. We now mention these simulations in the text at lines 135-138 and include the simulation results in Figure 1-figure supplement 4.

      Author response image 4.

      Power of AMSD and QTL mapping on simulated haplotypes with variable marker allele frequencies. We simulated germline mutations on the specified number of haplotypes (as described in the manuscript), but simulated genotypes at the mutator allele such that "A" alleles were at the specified allele frequency.

      1. In the Methods section of "testing for epistasis between the two mutator loci", it will be helpful to explicitly lay out the model and assumptions in mathematical formulae, in addition to the R scripts. For example, are the two loci considered independent when their effects on mutation rate is multiplicative or additive? Given the R scripts provided, it seems that the two loci are assumed to have multiplicative effects on the mutation rate, and that the mutation count follows a Poisson distribution with mean being the mutation rate times ADJ_AGE (i.e., the mutation opportunity times the number of generations of an inbred line). However, this is not easily understandable for readers who are not familiar with R language. In addition, I hope the authors can be more specific when discussing the epistatic interaction between the two loci by explicitly saying "synergistic effects beyond multiplicative effects on the C>A mutation rate".

      The reviewer raises a good point about the clarity of our descriptions of tests for epistasis. We have now added a more detailed description of these tests in the section of the Materials and Methods beginning at line 875. We have also added a statement to the text at lines 289-291: “the combined effects of D genotypes at both loci exceed the sum of marginal effects of D genotypes at either locus alone.” We hope that this will help clarify the results of our tests for statistical epistasis.

      Reviewer 2 (Public Review):

      1. The main limitation of the approach is that it is difficult to see how it might be applied beyond the context of mutation accumulation experiments using recombinant inbred lines. This is because the signal it detects, and hence its power, is based on the number of extra accumulated mutations linked to (i.e. on the same chromosome as) the mutator allele. In germline mutation studies of wild populations the number of generations involved (and hence the total number of mutations) is typically small, or else the mutator allele becomes unlinked from the mutations it has caused (due to recombination), or is lost from the population altogether (due to chance or perhaps selection against its deleterious consequences).

      The reviewer is correct that as it currently exists, IHD is mostly limited to applications in recombinant inbred lines (RILs) like the BXDs. This is due to the fact that IHD assumes that each diploid sample harbors one of two possible genotypes at a particular locus and ignores the possibility of heterozygous genotypes for simplicity. In natural, outbreeding populations, this assumption will obviously not hold. However, as we plan to further iterate on and improve the IHD method, we hope that it will be applicable to a wider variety of experimental systems in the future. We have added additional caveats about the applicability of our method to other systems in the text at lines 545-550.

      Private Comments

      1. On p. 8, perhaps I've misunderstood but it's not clear in what way the SVs identified were relevant to the samples used in this dataset - were the founder strains assembled? Is there any chance that additional SVs were present, e.g. de novo early in the accumulation line?

      Our description of this structural variation resource could have been clearer. The referenced SVs were identified in Ferraj et al. (2023) by generating high-quality long read assemblies of inbred laboratory mice. Both DBA/2J and C57BL/6J (the founder strains for the BXD resource) were included in the Ferraj et al. SV callset. We have clarified our description of the callset at lines 247-248.

      It is certainly possible that individual BXD lines have accumulated de novo structural variants during inbreeding. However, these "private" SVs are unlikely to produce a strong IHD association signal (via linkage to one of the ~7,000 markers) at either the chromosome 4 or chromosome 6 locus, since we only tested markers that were at approximately 50% D allele frequency among the BXDs.

      1. On p. 13, comparing the IHD and QTL approaches, regarding the advantage of the former in that it detects the combined effect of multiple k-mer mutation types, would it not be straightforward to aggregate counts for different types in a QTL setting as well?

      The mutation spectrum is a multi-dimensional phenotype (6-dimensional if using the 1-mer spectrum, 96-dimensional if using the 3-mer spectrum, etc.). Most QTL mapping methods use linear models to test for associations between genotypes and a 1-dimensional phenotype (e.g., body weight, litter size). In the past, we used QTL mapping to test for associations between genotypes and a single element of the mutation spectrum (e.g., the rate of C>A mutations), but there isn't a straightforward way to aggregate or collapse the mutation spectrum into a 1dimensional phenotype that retains the information contained within the full 1-mer or 3-mer spectrum. For that reason, we developed the "aggregate mutation spectrum" approach, as it preserves information about the complete mutation spectrum in each group of strains.

      The reviewer is correct that we could also aggregate counts of different mutation types to, say, perform a QTL scan for the load of a specific mutational signature. For example, we could first perform standard mutational signature analysis on our dataset and then test for QTLs associated with each signature that is discovered. However, this approach would not solve the second problem that our method is designed to solve: the appropriate weighting of samples based on how many mutations they contain.

      1. pp. 15-16: In the discussion of how you account for relatedness between strains, I found the second explanation (on p. 16) much clearer. It would be interesting to know how much variance was typically accounted for by this regression?

      As shown in the response to Reviewer 1, genotype similarity between genotype groups (i.e., those with either D or B genotypes at a marker) generally explains a small amount of variance in the cosine distance between those groups (R2 ~= 0.007). However, since the slope term in that regression is significantly non-zero, correcting for this relationship should still improve our power relative to using raw cosine distance values that are slightly confounded by this relationship.

      1. Similarly, in the section on Applying the IHD method to the BXDs (pp. 18-19), I think this description was very useful, and some or all of this description of the experiment (and how the DNMs in it arise) could profitably be moved to the introduction.

      We appreciate the reviewer’s feedback about the details of the BXD cohort. Overall, we feel the description of the BXDs in the Introduction (at lines 65-73) is sufficient to introduce the cohort, though we now add some additional detail about variability in BXD inbreeding duration (at lines 89-93) to the Introduction as well, since it is quite relevant to some of the new simulation results presented in the manuscript.

      1. A really minor one, not sure if this is for the journal or the authors, but it would be much better to include both page and line numbers in any version of an article for review. My pdf had neither!

      We apologize for the lack of page/line numbers in the submitted PDF. We have now added line numbers to the revised version of the manuscript.

      Reviewer 3 (Public Review):

      1. Under simulated scenarios, the authors' new IHD method is not appreciably more powerful than conventional QTL mapping methods. While this does not diminish the rigor or novelty of the authors findings, it does temper enthusiasm for the IHD method's potential to uncover new mutators in other populations or datasets. Further, adaptation of this methodology to other datasets, including human trios or multigenerational families, will require some modification, which could present a barrier to broader community uptake. Notably, BXD mice are (mostly) inbred, justifying the authors consideration of just two genotype states at each locus, but this decision prevents out-of-the-box application to outbred populations and human genomic datasets. Lastly, some details of the IHD method are not clearly spelled out in the paper. In particular, it is unclear whether differences in BXD strain relatedness due to the breeding epoch structure are fully accounted for in permutations. The method's name - inter-haplotype distance - is also somewhat misleading, as it seems to imply that de novo mutations are aggregated at the scale of sub-chromosomal haplotype blocks, rather than across the whole genome.

      The reviewer raises very fair concerns. As mentioned in response to a question from Reviewer 1, we performed additional simulation experiments that demonstrate the improved power of IHD (as compared to QTL mapping) in situations where mutation counts are variable across haplotypes or when mutator alleles are present at allele frequencies <50% (see Author response image 2 and 3, as well as new supplements to Figure 1 in the manuscript). However, the reviewer is correct that the IHD method is not applicable to collections of outbred individuals (that is, individuals with both heterozygous and homozygous genotypes), which will limit its current applications to datasets other than recombinant inbred lines. We have added a mention of these limitations to the Results at lines 138-141 and the Discussion at lines 545-550, but plan to iterate on the IHD method and introduce new features that enable its application to other datasets. We have also explicitly stated that we account for breeding epochs in our permutation tests in the Materials and Methods at lines 670-671. Both Reviewer 1 and Reviewer 3 raised concerns about the name of our method, and we have therefore changed “inter-haplotype distance” to “aggregate mutation spectrum distance” throughout the manuscript.

      1. Nominating candidates within the chr6 mutator locus requires an approach for defining a credible interval and excluding/including specific genes within that interval as candidates. Sasani et al. delimit their focal window to 5Mb on either side of the SNP with the most extreme P-value in their IHD scan. This strategy suffers from several weaknesses. First, no justification for using 10 Mb window, as opposed to, e.g., a 5 Mb window or a window size delimited by a specific threshold of P-value drop, is given, rendering the approach rather ad hoc. Second, within their focal 10Mb window, the authors prioritize genes with annotated functions in DNA repair that harbor protein coding variants between the B6 and D2 founder strains. While the logic for focusing on known DNA repair genes is sensible, this locus also houses an appreciable number of genes that are not functionally annotated, but could, conceivably, perform relevant biological roles. These genes should not be excluded outright, especially if they are expressed in the germline. Further, the vast majority of functional SNPs are non-coding, (including the likely causal variant at the chr4 mutator previously identified in the BXD population). Thus, the author's decision to focus most heavily on coding variants is not well-justified. Sasani et al. dedicate considerable speculation in the manuscript to the likely identity of the causal variant, ultimately favoring the conclusion that the causal variant is a predicted deleterious missense variant in Mbd4. However, using a 5Mb window centered on the peak IHD scan SNP, rather than a 10Mb window, Mbd4 would be excluded. Further, SNP functional prediction accuracy is modest [e.g., PMID 28511696], and exclusion of the missense variant in Ogg1 due its benign prediction is potentially premature, especially given the wealth of functional data implicating Ogg1 in C>A mutations in house mice. Finally, the DNA repair gene closest to the peak IHD SNP is Rad18, which the authors largely exclude as a candidate.

      We agree that the use of a 10 Mb window, rather than an empirically derived confidence interval, is a bit arbitrary and ad hoc. To address this concern, we have implemented a bootstrap resampling approach (Visscher et al. 1996, Genetics) to define confidence intervals surrounding IHD peaks. We have added a description of the approach to the Materials and Methods at lines 609-622, but a brief description follows. In each of N trials (here, N = 10,000), we take a bootstrap sample of the BXD phenotype and genotype data with replacement. We then perform an IHD scan on the chromosome of interest using the bootstrap sample and record the position of the marker with the largest cosine distance value (i.e., the "peak" marker). After N trials, we calculate the 90% confidence interval of bootstrapped peak marker locations; in other words, we identify the locations of two genotyped markers, between which 90% of all bootstrap trials produced an IHD peak. We note that bootstrap confidence intervals can exhibit poor "coverage" (a measure of how often the confidence intervals include the "true" QTL location) in QTL mapping studies (see Manichaikul et al. 2006, Genetics), but feel that the bootstrap is more reasonable than simply defining an ad hoc interval around an IHD peak.

      The new 90% confidence interval surrounding the IHD peak on chromosome 6 is larger than the original (ad hoc) 10 Mbp window, now extending from around 95 Mbp to 114 Mbp. Notably, the new empirical confidence interval excludes Mbd4. We have accordingly updated our Results and Discussion sections to acknowledge the fact that Mbd4 no longer resides within the confidence interval surrounding the IHD peak on chromosome 6 and have added additional descriptions of genes that are now implicated by the 90% confidence interval. Given the uncertainties associated with using bootstrap confidence intervals, we have retained a brief discussion of the evidence supporting Mbd4 in the Discussion but focus primarily on Ogg1 as the most plausible candidate.

      The reviewer raises a valid concern about our treatment of non-DNA repair genes within the interval surrounding the peak on chromosome 6. We have added more careful language to the text at lines 219-223 to acknowledge the fact that non-annotated genes in the confidence interval surrounding the chromosome 6 peak may play a role in the epistatic interaction we observed.

      The reviewer also raises a reasonable concern about our discussions of both Mbd4 and Ogg1 as candidate genes in the Discussion. Since Mbd4 does not reside within the new empirical bootstrap confidence interval on chromosome 6 and given the strong prior evidence that Ogg1 is involved in C>A mutator phenotypes (and is in the same gene network as Mutyh), we have reframed the Discussion to focus on Ogg1 as the most plausible candidate gene (see lines 357360).

      Using the GeneNetwork resource, we also more carefully explored the potential effects of noncoding variants on the C>A mutator phenotype we observed on chromosome 6. We have updated the Results at lines 240-246 and the Discussion at line 439-447 to provide more evidence for regulatory variants that may contribute to the C>A mutator phenotype. Specifically, we discovered a number of strong-effect cis-eQTLs for Ogg1 in a number of tissues, at which D genotypes are associated with decreased Ogg1 expression. Given new evidence that the original mutator locus we discovered on chromosome 4 harbors an intronic mobile element insertion that significantly affects Mutyh expression (see Ferraj et al. 2023, Cell Genomics), it is certainly possible that the mutator phenotype associated with genotypes on chromosome 6 may also be mediated by regulatory, rather than coding, variation.

      1. Additionally, some claims in the paper are not well-supported by the author's data. For example, in the Discussion, the authors assert that "multiple mutator alleles have spontaneously arisen during the evolutionary history of inbred laboratory mice" and that "... mutational pressure can cause mutation rates to rise in just a few generations of relaxed selection in captivity". However, these statements are undercut by data in this paper and the authors' prior publication demonstrating that a number of candidate variants are segregating in natural mouse populations. These variants almost certainly did not emerge de novo in laboratory colonies, but were inherited from their wild mouse ancestors. Further, the wild mouse population genomic dataset used by the authors falls far short of comprehensively sampling wild mouse diversity; variants in laboratory populations could derive from unsampled wild populations.

      The reviewer raises a good point. In our previous publication (Sasani et al. 2022, Nature), we hypothesized that Mutyh mutator alleles had arisen in wild, outbreeding populations of Mus musculus, and later became fixed in inbred strains like DBA/2J and C57BL/6J. However, in the current manuscript, we included a statement about mutator alleles "spontaneously arising during the evolutionary history of inbred laboratory mice" to reflect new evidence (from Ferraj et al. 2023, Cell Genomics) that the mutator allele we originally identified in Mutyh may not be wild derived after all. Instead, Ferraj et al. suggest that the C>A mutator phenotype we originally identified is caused by an intronic mobile element insertion (MEI) that is present in DBA/2J and a handful of other inbred laboratory strains. Although this MEI may have originally occurred in a wild population of mice, we wanted to acknowledge the possibility that both the original Mutyh mutator allele, as well as the new mutator allele(s) we discovered in this manuscript, could have arisen during the production and inbreeding of inbred laboratory lines. We have also added language to the Discussion at lines 325-327 to acknowledge that the 67 wild mice we analyzed do not comprise a comprehensive picture of the genetic diversity present in wild-derived samples.

      We have added additional language to the Discussion at lines 349-357 in which we acknowledge that the chromosome 6 mutator allele might have originated in either laboratory or wild mice and elaborate on the possibility that mutator alleles with deleterious fitness consequences may be more likely to persist in inbred laboratory colonies.

      1. Finally, the implications of a discovering a mutator whose expression is potentially conditional on the genotype at a second locus are not raised in the Discussion. While not a weakness per se, this omission is perceived to be a missed opportunity to emphasize what, to this reviewer, is one of the most exciting impacts of this work. The potential background dependence of mutator expression could partially shelter it from the action of selection, allowing the allele persist in populations. This finding bears on theoretical models of mutation rate evolution and may have important implications for efforts to map additional mutator loci. It seems unfortunate to not elevate these points.

      We agree and have added additional discussion of the possibility that the C>A mutator phenotypes in the BXDs are a result of interactions between the expression of two DNA repair genes in the same base-excision network to the Discussion section at lines 447-449.

      Private comments

      1. The criteria used to determine or specify haplotype size are not specified in the manuscript. I mention this above but reiterate here as this was a big point of confusion for me when reading the paper. Haplotype length is important consideration for overall power and for proper extension of this method to other systems/populations.

      We may not have been clear enough in our description of our method, and as suggested by Reviewer 1, the name "inter-haplotype distance" may also have been a source of confusion. At a given marker, we compute the aggregate mutation spectrum in BXDs with either B or D genotypes using all genome-wide de novo mutations observed in those BXDs. Since the BXDs were inbred for many generations, we expect that almost all de novo germline mutations observed in an RIL are in near-perfect linkage with the informative genotypes used for distance scans. Thus, the "haplotypes" used in the inter-haplotype distance scans are essentially the lengths of entire genomes.

      1. Results, first paragraph, final sentence. I found the language here confusing. I don't understand how one can compute the cosine distance at single markers, as stated. I'm assuming cosine distance is computed from variants residing on haplotypes delimited by some defined window surrounding the focal marker?

      As discussed above, we aggregate all genome-wide de novo mutations in each group of BXDs at a given marker, rather than only considering DNMs within a particular window surrounding the marker. The approach is discussed in greater detail in the caption of Figure 1.

      1. Nominating candidates for the chr6 locus, Table 1. It would be worth confirming that the three prioritized candidates (Setmar, Ogg1, and Mbd4) all show germline expression.

      Using the Mouse Genome Informatics online resource, we confirmed that all prioritized candidate genes (now including Setmar and Ogg1, but not Mbd4) are expressed in the male and female gonads, and mention this in the Results at lines 228 and 233-234.

      1. Does the chr6 peak on the C>A LOD plot (Figure 2- figure supplement 1) overlap the same peak identified in the IHD scan? And, does this peak rise to significance when using alpha = 0.05? Given that the goal of these QTL scans is to identify loci that interact with the C>A mutator on chr4, it is reasonable to hypothesize that the mutation impact of epistatic loci will also be restricted to C>A mutations. Therefore, I am not fully convinced that the conservative alpha = 0.05/7 threshold is necessary.

      The chromosome 6 peak in Figure 2-figure supplement 1 does, in fact, overlap the peak marker we identified on chromosome 6 using IHD. One reason we decided to use a more conservative alpha of (0.05 / 7) is that we wanted these results to be analogous to the ones we performed in a previous paper (Sasani et al. 2022, Nature), in which we first identified the mutator locus on chromosome 4. However, the C>A peak does not rise to genome-wide significance if we use a less conservative alpha value of 0.05 (see Author response image 5). As discussed in our response to Reviewer 1, we find that QTL mapping is not as powerful as IHD when haplotypes have accumulated variable numbers of germline mutations (as in the BXDs), which likely explains the fact that the peak on chromosome 6 is not genome-wide significant using QTL mapping.

      Author response image 5.

      QTL scan for the fraction of C>A mutations in BXDs harboring D alleles at the locus near Myth QTL scan was performed at a genome-wide significance alpha of 0.05, rather than 0.05/7.

      1. Is there significant LD between the IHD peaks on chr6 and chr4 across the BXD? If so, it could suggest that the signal is driven by cryptic population structure that is not fully accounted for in the author's regression based approach. If not, this point may merit an explicit mention in the text as an additional validation for the authenticity of the chr6 mutator finding.

      This is a good question. We used the scikit-allel Python package to calculate linkage disequilibrium (LD) between all pairs of genotyped markers in the BXD cohort, and found that the two peak loci (on chromosomes 4 and 6) exhibit weak LD (r2 = 4e-5). We have added a mention of this to the main text of the Results at lines 212-213. That being said, we do not think the chromosome 6 mutator association (or the apparent epistasis between the alleles on chromosomes 4 and 6) could be driven by cryptic population structure. Unlike in human GWAS and other association studies in natural populations, there is no heterogeneity in the environmental exposures experienced by different BXD subpopulations. In humans, population structure can create spurious associations (e.g., between height and variants that are in LD and are most common in Northern Europe), but this requires the existence of a phenotypic gradient caused by genetic or environmental heterogeneity that is not likely to exist in the context of inbred laboratory mice that are all the progeny of the same two founder strains.

      1. Discussion, last sentence of the "Possible causal alleles..." section: I don't understand how the absence of the Mariner-family domain leads the authors to this conclusion. Setmar is involved in NHEJ, which to my knowledge is not a repair process that is expected to have a specific C>A mutation bias. I think this is grounds enough for ruling out its potential contributions, in favor of focusing on other candidates, (e.g., Mbd4 and Ogg1).

      The reviewer raises a good point. Our main reason for mentioning the absence of the Marinerfamily domain is that even if NHEJ were responsible for the C>A mutator phenotype, it likely wouldn't be possible for Setmar to participate in NHEJ without the domain. However, the reviewer is correct that NHEJ is not expected to cause a C>A mutation bias, and we have added a mention of this to the text as well at lines 379-382.

      1. Discussion, second to last paragraph of section "Mbd4 may buffer...": The authors speculate that reduced activity of Mbd4 could modulate rates of apoptosis in response to DNA damage. This leads to the prediction that mice with mutator alleles at both Mutyh and Mbd4 should exhibit higher overall mutation rates compared to mice with other genotypes. This possibility could be tested with the authors' data.

      The reviewer raises a good question. As mentioned above, however, we implemented a new approach to calculate confidence intervals surrounding distance peaks and found that this empirical approach (rather than the ad hoc 10-Mbp window approach we used previously) excluded Mbd4 from the credible interval. Although we still mention Mbd4 as a possible candidate (since it still resides within the 10 Mbp window), we have refactored the Discussion section to focus primarily on the evidence for Ogg1 as a candidate gene on chromosome 6.

      In any case, we do not observe that mice with mutator alleles at both the chromosome 4 and chromosome 6 loci have higher overall mutation rates compared to mice with other genotype combinations. This may not be terribly surprising, however, since C>A mutations only comprise about 10% of all possible mutations. Thus, given the variance in other 1-mer mutation counts, even a substantial increase in the C>A mutation rate might not have a detectable effect on the overall mutation rate. Indeed, in our original paper describing the Mutyh mutator allele (Sasani et al. 2022, Nature), we did not identify any QTL for the overall mutation rate in the BXDs and found that mice with the chromosome 4 mutator allele only exhibited a 1.11X increase in their overall mutation rates relative to mice without the mutator allele.

      1. Methods, "Accounting for BXD population structure": An "epoch-aware" permutation strategy is described here, but it is not clear when (and whether) this strategy is used to determine significance of IHD P-values.

      We have added a more explicit mention of this to the Methods section at lines 670-671, as we do, in fact, use the epoch-aware permutation strategy when calculating empirical distance thresholds.

      1. The simulation scheme employed for power calculations is highly specific to the BXD population. This is not a weakness, and perfectly appropriate to the study population used here. However, it does limit the transferability of the power analyses presented in this manuscript to other populations. This limitation may merit an explicit cautionary mention to readers who may aspire to port the IHD method over to their study system.

      This is true. Our simulation strategy is relatively simple and makes a number of assumptions about the simulated population of haplotypes (allele frequencies normally distributed around 0.5, expected rates of each mutation type, etc.). In response to concerns from Reviewer 1, we performed an updated series of simulations in which we varied some of these parameters (mutator allele frequencies, mean numbers of mutations on haplotypes, etc.). However, we have added a mention of the simulation approach's limitations and specificity to the BXDs to the text at lines 545-550.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Continuous attractor networks endowed with some sort of adaptation in the dynamics, whether that be through synaptic depression or firing rate adaptation, are fast becoming the leading candidate models to explain many aspects of hippocampal place cell dynamics, from hippocampal replay during immobility to theta sequences during run. Here, the authors show that a continuous attractor network endowed with spike frequency adaptation and subject to feedforward external inputs is able to account for several previously unaccounted aspects of theta sequences, including (1) sequences that move both forwards and backwards, (2) sequences that alternate between two arms of a T-maze, (3) speed modulation of place cell firing frequency, and (4) the persistence of phase information across hippocampal inactivations. I think the main result of the paper (findings (1) and (2)) are likely to be of interest to the hippocampal community, as well as to the wider community interested in mechanisms of neural sequences. In addition, the manuscript is generally well written, and the analytics are impressive. However, several issues should be addressed, which I outline below.

      Major comments:

      1. In real data, population firing rate is strongly modulated by theta (i.e., cells collectively prefer a certain phase of theta - see review paper Buzsaki, 2002) and largely oscillates at theta frequency during run. With respect to this cyclical firing rate, theta sweeps resemble "Nike" check marks, with the sweep backwards preceding the sweep forwards within each cycle before the activity is quenched at the end of the cycle. I am concerned that (1) the summed population firing rate of the model does not oscillate at theta frequency, and (2) as the authors state, the oscillatory tracking state must begin with a forward sweep. With regards to (1), can the authors show theta phase spike preference plots for the population to see if they match data? With regards to (2), can the authors show what happens if the bump is made to sweep backwards first, as it appears to do within each cycle?

      Thank you for raising these two important points. As the reviewer mentioned, experimental data does show that the population activity (e.g., calculated from the multiunit activity of tetrode recording) is strongly modulated by theta. While we mainly focused on sweeps of bump position, the populational activity also shows cyclical firing at the theta frequency (we added Fig. S7 to reflect this). This is also reflected in Fig. 4d where the bump height (representing the overall activity) oscillates at individual theta cycles. The underlying mechanism of cyclical population activity is as follows: the bump height is determined by the amount of input the neuron received (which located at the center of the bump). While the activity bump sweeps away from the external input, the center neuron receives less input from the external input, and hence the bump height is smaller. Therefore, not only the position sweeps around the external input, also the populational activity sweeps accordingly at the same frequency.

      For the “Nike” check marks: we first clarify that the reason for we observed a forward sweep preceding a backward sweep is that we always force the artificial animal runs from left to right on the track where we treated “right” as “forward”. At the beginning of simulation, the external input to the network moves towards right, and therefore the activity bump starts from a position behind the animals and sweeps towards right (forward). In general, this means that the bump will never do a backward sweep first in our model. However, this does not mean that the forward sweeps precede the backward sweeps in each theta cycle. Experimentally, to determine the “0” phase of theta cycles, the LFP signal in CA1 was first bandpass filtered and then Hilbert transformed to get the phase at each time point. Then, a phase histogram of multiunit activity in CA1 was calculated across locomotor periods; the phase of maximal CA1 firing on the histogram was then defined to be “0” phase. Since we didn’t model LFP oscillation in the attractor model, we cannot obtain a “0” phase reference like the experimental procedure. Instead, we define the “0” phase using the “population activity quenched time”, where phase “0” is defined as the minimum population activity during oscillation cycles, which happens when the activity bump is farthest from the animal position. In this way, we observed a “Nike” pattern where the activity bump begins with a backward sweep towards the external input and then followed up with a forward sweep. This was showed in Fig. 3b in the main text.

      1. I could not find the width of the external input mentioned anywhere in the text or in the table of parameters. The implication is that it is unclear to me whether, during the oscillatory tracking state, the external input is large compared to the size of the bump, so that the bump lives within a window circumscribed by the external input and so bounces off the interior walls of the input during the oscillatory tracking phase, or whether the bump is continuously pulled back and forth by the external input, in which case it could be comparable to the size of the bump. My guess based on Fig 2c is that it is the latter. Please clarify and comment.

      Thank you for your comment. We added the width of the external input to the text and table (see table 1). The bump is continuously pulled back and forth by the external input, as guessed by the reviewer. Experimentally, theta sweeps live roughly in the window of place field size. This is also true in our model, where theta sweep length depends on the strength of recurrent connections which determines the place field size. However, it also depends on the adaptation strength where large adaptation (more intrinsic mobility) leads to large sweep length. We presume that the reason for the reviewer had the guess that the bump may live within a window bounded by the external input is that we also set the width of external input comparable to the place field size (in fact, we don’t know how wide the external location input to the hippocampal circuits is in the biological brain, but it might be reasonable to set the external input width as comparable to the place field size, otherwise the location information conveyed to the hippocampus might be too dispersed). We added a plot in the SI (see Fig. S1) to show that when choosing a smaller external input width, but increasing the adaptation strength, the activity bump lives in a window exceeding the external input.

      We clarified this point by adding the following text to line 159

      “... It is noteworthy that the activity bump does not live within a window circumscribed by the external input bump (bouncing off the interior walls of the input during the oscillatory tracking state), but instead is continuously pulled back and forth by the external input (see Fig. S1)...”

      1. I would argue that the "constant cycling" of theta sweeps down the arms of a T-maze was roughly predicted by Romani & Tsodyks, 2015, Figure 7. While their cycling spans several theta cycles, it nonetheless alternates by a similar mechanism, in that adaptation (in this case synaptic depression) prevents the subsequent sweep of activity from taking the same arm as the previous sweep. I believe the authors should cite this model in this context and consider the fact that both synaptic depression and spike frequency adaptation are both possible mechanisms for this phenomenon. But I certainly give the authors credit for showing how this constant cycling can occur across individual theta cycles.

      Thank you for raising this point. We added the citation of Romani & Tsodyks’ model in the context (line 304). As the reviewer pointed out, STD can also act as a potential mechanism for this phenomenon. We also gave the Romani & Tsodyks’ model credit for showing how this “cycling spanning several theta cycles” can account for the phenomenon of slow (~1Hz) and deliberative behaviors, namely, head scanning (Johson and Redish, 2007). We commented this in line 302

      “... As the external input approaches the choice point, the network bump starts to sweep onto left and right arms alternatively in successive theta cycles (Fig. 5b and video 4; see also Romani and Tsodyks (2015) for a similar model of cyclical sweeps spanning several theta cycles) ...”

      1. The authors make an unsubstantiated claim in the paragraph beginning with line 413 that the Tsodyks and Romani (2015) model could not account for forwards and backwards sweeps. Both the firing rate adaptation and synaptic depression are symmetry breaking models that should in theory be able to push sweeps of activity in both directions, so it is far from obvious to me that both forward and backward sweeps are not possible in the Tsodyks and Romani model. The authors should either prove that this is the case (with theory or simulation) or excise this statement from the manuscript.

      Thank you for your comment. Our claim about the Tsodyks and Romani (2015) model's inability to account for both forward and backward sweeps was inappropriate. We made this claim based on our own implementation of the Tsodyks and Romani (2015) model and didn’t find a parameter region where the bump oscillation shows both forward and backward sweeps. It might be due to the limited parameter range we searched from. Additionally, we also note some difference in these two models, where the Romani & Tsodyks’ model has an external theta input to the attractor network which prevent the bump to move further. This termination may also prevent the activity bump to move backward as well. We didn’t consider external theta input in our model, and the bump oscillation is based on internal dynamics. We have deleted that claim from line 424 in the revised paper, and revised that portion of the manuscript by adding the following text to line 424:

      “…Different from these two models, our model considers firing rate adaptation to implement symmetry breaking and hence generates activity propagation. To prevent the activity bump from spreading away, their model considers an external theta input to reset the bump location at the end of each theta cycle, whereas our model generates an internal oscillatory state, where the activity bump travels back due to the attraction of external location input once it spreads too far away. Moreover, theoretical analysis of our model reveals how the adaptation strength affect the direction of theta sweeps, as well as offers a more detailed understanding of theta cycling in complex environments…”

      1. The section on the speed dependence of theta (starting with line 327) was very hard to understand. Can the authors show a more graphical explanation of the phenomenon? Perhaps a version of Fig 2f for slow and fast speeds, and point out that cells in the latter case fire with higher frequency than in the former?

      Thank you for raising this valuable point. There are two different frequencies showed in Fig. 6 a,c &d. One is the bump oscillation frequency, the other is the firing frequency of single cell. To help understanding, we included experimental results (from Geisler et al, 2007) in Fig. 6a. It showed that when the animal increases its running speed, the LFP theta only increases a bit (compare the blue curve and the green curve), while the single cell firing rate oscillation frequency increases more. In our model, we first demonstrated this result using unimodal cells which have only significant phase precession (Fig. 6c). While the animal runs through the firing field of a place cell, the firing phase will always precess for half a cycle in total. Therefore, faster running speed means that the half cycle will be accomplished faster, and hence single cell oscillation frequency will be higher. We also predicted the results on bimodal cells (Fig. 6d). To make this point clearer, we modified Fig. 6 by including experimental results, and rewrote the paragraph as follows (line 337):

      “…As we see from Fig. 3d and Fig. 4a&b, when the animal runs through the firing field of a place cell, its firing rate oscillates, since the activity bump sweeps around the firing field center of the cell. Therefore, the firing frequency of a place cell has a baseline theta frequency, which is the same as the bump oscillation frequency. Furthermore, due to phase precession, there will be a half cycle more than the baseline theta cycles as the animal runs over the firing field, and hence single cell oscillatory frequency will be higher than the baseline theta frequency (Fig. 6c). The faster the animal runs, the faster the extra half cycle is accomplished. Consequently, the firing frequency of single cells will increase more (a steeper slope in Fig. 6c red dots) than the baseline frequency.…”

      1. I had a hard time understanding how the Zugaro et al., (2005) hippocampal inactivation experiment was accounted for by the model. My intuition is that while the bump position is determined partially by the location of the external input, it is also determined by the immediate history of the bump dynamics as computed via the local dynamics within the hippocampus (recurrent dynamics and spike rate adaptation). So that if the hippocampus is inactivated for an arbitrary length of time, there is nothing to keep track of where the bump should be when the activity comes back online. Can the authors please explain more how the model accounts for this?

      Thank you for the comments. The easiest way to understand how the model account for the experimental result from Zugaro et al., (2005) is from Eq. 8:

      This equation says that the firing phase of a place cell is determined by the time the animal traveled through the place field, i.e., the location of the animal in the place field (with d0,c0 and vext all constant, and tf the only variable). No matter how long the hippocampus is inactivated (for an arbitrary length of time), once the external input is on, the new phase will continue from the new location of the animal in the place field. In other words, the peak firing phase keeps tracking the location of the animal. To make this point clearer, we modified Fig. 6 by including experimental results from Zugaro et al., (2005), and updated the description from line 356:

      “…Based on the theoretical analysis (Eq. 8), we see that the firing phase is determined by the location of the animal in the place field, i.e., vext tf. This means that the firing phase keeps tracking the animal's physical location. No matter how long the network is inactivated, the new firing phase will only be determined by the new location of the animal in the place field. Therefore, the firing phase in the first bump oscillation cycle after the network perturbation is more advanced than the firing phase in the last bump oscillation cycle right before the perturbation, and the amount of precession is similar to that in the case without perturbation (Fig. 6e) …”

      1. Can the authors comment on why the sweep lengths oscillate in the bottom panel of Fig 5b during starting at time 0.5 seconds before crossing the choice point of the T-maze? Is this oscillation in sweep length another prediction of the model? If so, it should definitely be remarked upon and included in the discussion section.

      We appreciate the reviewer’s valuable attention of this phenomenon. We thought it was a simulation artifact due to the parameter setting. However, we found that this phenomenon is quite robust to different parameter settings. While we haven’t found a theoretical explanation, we provide a qualitative explanation for it: this length oscillation frequency may be coupled with the time constant of the firing rate adaptation. Specifically, for a longer sweep, the neurons at the end of the sweep are adapted (inhibited), and hence the activity bump cannot travel that long in the next round. Therefore, the sweep length is shorter compared to the previous one. In the next round, the bump will sweep longer again because those neurons have recovered from the previous adaptation effect. We think this length oscillation is quite interesting and will check that in the experimental data in future works. We added this point in the main text as a prediction in line 321:

      “…We also note that there is a cyclical effect in the sweep lengths across oscillation cycles before the animal enters the left or right arm (see Fig. 5b lower panel), which may be interesting to check in the experimental data in future work (see Discussion for more details) …”

      And line 466:

      “…Our model of the T-maze environment showed an expected phenomenon that as the animal runs towards the decision point, the theta sweep length also shows cyclical patterns (Fig. 5b lower panel). An intuitive explanation is that, due to the slow dynamics in firing rate adaptation (with a large time constant compared to neural firing), a long sweep leads to an adaptation effect on the neurons at the end of the sweep path. Consequently, the activity bump cannot travel as far due to the adaptation effect on those neurons, resulting in a shorter sweep length compared to the previous one. In the next round, the activity bump exhibits a longer sweep again because those neurons have recovered from the previous adaptation effect. We plan to test this phenomenon in future experiments...”

      1. Perhaps I missed this, but I'm curious whether the authors have considered what factors might modulate the adaptation strength. In particular, might rat speed modulate adaptation strength? If so, would have interesting predictions for theta sequences at low vs high speeds.

      Thank you for raising up this important point. As we pointed out in line 279: “…the experimental data (Fernandez et al, 2017) has indicated that there is a laminar difference between unimodal cells and bimodal cells, with bimodal cells correlating more with the firing patterns of deep CA1 neurons and unimodal cells with the firing patterns of superficial CA1 neurons. Our model suggests that this difference may come from the different adaptation strengths in the two layers…”. Our guess is that the adaptation strength might reflect some physiological differences of place cells in difference pyramidal layers in the hippocampus. For example, place cells in superficial layer and deep layer receive different amount of input from MEC and sensory cortex, and such difference may contribute to a different effect of adaptation of the two populations of place cells.

      Our intuition is that animal’s running speed may not directly modulate the adaptation strength. Note that the effect of adaptation and adaptation strength are different. As the animal rapidly runs across the firing field, the place cell experiences a dense firing (in time), therefore the adaptation effect is large; as the animal slowly runs across the field, the place cell experiences sparse firing (in time), and hence the adaptation effect is small. In these two situations, the adaption strength is fixed, but the difference is due to the spike intervals.

      From Eq. 45-47, our theoretical analysis shows several predictions of theta sequences regarding to the parameters in the network. For example, how the sweep length varies when the running speed changes in the network. We simulated the network in both low running speed and high running speed (while kept all other parameters fixed), and found that the sweep length at low speed is larger than that at high speed. This is different from previously data, where they showed that the sweep length increases as the animal runs faster (Maurer et al, 2012). However, we are not sure how other parameters are changed in the biological brain as the animal runs faster, e.g., the external input strength and the place field width might also vary as confounds. We will explore this more in the future and investigate how the adaptation strength is modulated in the brain.

      1. I think the paper has a number of predictions that would be especially interesting to experimentalists but are sort of scattered throughout the manuscript. It would be beneficial to have them listed more prominently in a separate section in the discussion. This should include (1) a prediction that the bump height in the forward direction should be higher than in the backward direction, (2) predictions about bimodal and unimodal cells starting with line 366, (3) prediction of another possible kind of theta cycling, this time in the form of sweep length (see comment above), etc.

      Thank you for pointing this out. We updated the manuscript by including a paragraph in Discussion summarizing the prediction we made throughout the manuscript (from line 459):

      ‘’…Our model has several predictions which can be tested in future experiments. For instance, the height of the activity bump in the forward sweep window is higher than that in the backward sweep window (Fig. 4c) due to the asymmetric suppression effect from the adaptation. For bimodal cells, they will have two peaks in their firing frequency as the animal runs across the firing fields, with one corresponding to phase precession and the other corresponding to phase procession. Similar to unimodal cells, both the phase precession and procession of a bimodal cell after transient intrahippocampal perturbation will continue from the new location of the animal (Fig. S5). Interestingly, our model of the T-maze environment showed an expected phenomenon that as the animal runs towards the decision point, the theta sweep length also shows cyclical patterns (Fig. 5b lower panel). An intuitive explanation is that, due to the slow dynamics in firing rate adaptation (with a large time constant compared to neural firing), a long sweep leads to an adaptation effect on the neurons at the end of the sweep path. Consequently, the activity bump cannot travel as far due to the adaptation effect on those neurons, resulting in a shorter sweep length compared to the previous one. In the next round, the activity bump exhibits a longer sweep again because those neurons have recovered from the previous adaptation effect. We plan to test this phenomenon in future experiments…’

      Reviewer #2:

      In this work, the authors elaborate on an analytically tractable, continuous-attractor model to study an idealized neural network with realistic spiking phase precession/procession. The key ingredient of this analysis is the inclusion of a mechanism for slow firing-rate adaptation in addition to the otherwise fast continuous-attractor dynamics. The latter which continuous-attractor dynamics classically arises from a combination of translation invariance and nonlinear rate normalization. For strong adaptation/weak external input, the network naturally exhibits an internally generated, travelling-wave dynamics along the attractor with some characteristic speed. For small adaptation/strong external stimulus, the network recovers the classical externally driven continuous-attractor dynamics. Crucially, when both adaptation and external input are moderate, there is a competition with the internally generated and externally generated mechanism leading to oscillatory tracking regime. In this tracking regime, the population firing profile oscillates around the neural field tracking the position of the stimulus. The authors demonstrate by a combination of analytical and computational arguments that oscillatory tracking corresponds to realistic phase precession/procession. In particular the authors can account for the emergence of a unimodal and bimodal cells, as well as some other experimental observations with respect the dependence of phase precession/procession on the animal's locomotion. The strengths of this work are at least three-fold: 1) Given its simplicity, the proposed model has a surprisingly large explanatory power of the various experimental observations. 2) The mechanism responsible for the emergence of precession/procession can be understood as a simple yet rather illuminating competition between internally driven and externally driven dynamical trends. 3) Amazingly, and under some adequate simplifying assumptions, a great deal of analysis can be treated exactly, which allows for a detailed understanding of all parametric dependencies. This exact treatment culminates with a full characterization of the phase space of the network dynamics, as well as the computation of various quantities of interest, including characteristic speeds and oscillating frequencies.

      1. As mentioned by the authors themselves, the main limitation of this work is that it deals with a very idealized model and it remains to see how the proposed dynamical behaviors would persist in more realistic models. For example, the model is based on a continuous attractor model that assumes perfect translation-invariance of the network connectivity pattern. Would the oscillating tracking behavior persist in the presence of connection heterogeneities?

      Thank you for raising up this important point. Continuous attractor models have been widely used in modeling hippocampal neural circuits (see McNaughton et al, 2006 for a review), and researchers often assumed that there is a translation-invariance structure in these network models. The theta sweep state we presented in the current work is based on the property of the continuous attractor state. We do agree with the reviewer that the place cell circuit might not be a perfect continuous attractor network. For a simpler case where the connection weights are sampled from a Gaussian distribution around J_0, the theta sweep state still exhibit in the network (see Fig. S8 for an example). We also believe that the model can be extended to more complex cases where there exist over-representations of the “home” location and decision points in the real environment, i.e., the heterogeneity is not random, but has stronger connections near those locations, then the theta sweeps will be more biased to those location. However, if the heterogeneity breaks the continuous attractor state, the theta sweep state may not be presented in the network.

      1. Can the oscillating tracking behavior be observed in purely spiking models as opposed to rate models as considered in this work?

      Thank you for pointing this out. The short answer is yes. If the translation-invariance of the network connectivity pattern hold in the network, i.e., the spiking network is still a continuous attractor network (see the work from Tsodyks et al, 1996; and from Yu et al. "Spiking continuous attractor neural networks with spike frequency adaptation for anticipative tracking"), then the adaptation, which has the mathematical form of spike frequency adaptation (instead of firing rate adaptation), will still generate sweep state of the activity bump. We here chose the rate-based model because it is analytically tractable, which gives us a better understanding of the underlying dynamics. Many of the continuous attractor model related to spatial tuning cell populations are rate-based (see examples Zhang 1996; Burak & Fiete 2009). However, extending to spike-based model would be straightforward.

      1. Another important limitation is that the system needs to be tuned to exhibit oscillation within the theta range and that this tuning involves a priori variable parameters such as the external input strength. Is the oscillating-tracking behavior overtly sensitive to input strength variations?

      Thank you for pointing this out. In rodent studies, theta sequences are thought to result from the integration of both external inputs conveying sensory-motor information, and intrinsic network dynamics possibly related to memory processes (see Drieu and Zugaro 2019; Drieu at al, 2018). We clarified here that, in our modeling work, the generation of theta sweeps also depends on both the external input and the intrinsic dynamics (induced by the firing rate adaptation). Therefore, we don’t think the dependence of theta sweeps on the prior parameter – the external input strength – is a limitation here. We agreed with the reviewer that the system needs to be tuned to exhibit oscillation within the theta range. However, the parameter range of inducing oscillatory state is relatively large (see Fig. 2g in the main text). It will be interesting to investigate (and find experimental evidence) how the biological system adjusts the network configuration to implement the sweep state in network dynamics.

      1. The author mentioned that an external pacemaker can serve to drive oscillation within the desired theta band but there is no evidence presented supporting this.

      Thank you for pointing this out. We made this argument based on our initial simulation before but didn’t go into the details of that. We have deleted that argument in the discussion and rewrote that part. We will carry out more simulations in the future to verify if this is true. See our changes from line 418 to line 431:

      “... A representative model relying on neuronal recurrent interactions is the activation spreading model. This model produces phase precession via the propagation of neural activity along the movement direction, which relies on asymmetric synaptic connections. A later version of this model considers short-term synaptic plasticity (short-term depression) to implicitly implement asymmetric connections between place cells, and reproduces many other interesting phenomena, such as phase precession in different environments. Different from these two models, our model considers firing rate adaptation to implement symmetry breaking and hence generates activity propagation. To prevent the activity bump from spreading away, their model considers an external theta input to reset the bump location at the end of each theta cycle, whereas our model generates an internal oscillatory state, where the activity bump travels back due to the attraction of external location input once it spreads too far away. Moreover, theoretical analysis of our model reveals how the adaptation strength affect the direction of theta sweeps, as well as offers a more detailed understanding of theta cycling in complex environments...”

      1. A final and perhaps secondary limitation has to do with the choice of parameter, namely the time constant of neural firing which is chosen around 3ms. This seems rather short given that the fast time scale of rate models (excluding synaptic processes) is usually given by the membrane time constant, which is typically about 15ms. I suspect this latter point can easily be addressed.

      Thank you for pointing this out. The time constant we currently chose is relatively short as used in other studies. We conducted additional simulation by adjusting the time constant to 10ms, and the results reported in this paper remain consistent. Please refer to Fig S9 for the results obtained with a time constant of 10 ms.

      Reviewer #3:

      With a soft-spoken, matter-of-fact attitude and almost unwittingly, this brilliant study chisels away one of the pillars of hippocampal neuroscience: the special role(s) ascribed to theta oscillations. These oscillations are salient during specific behaviors in rodents but are often taken to be part of the intimate endowment of the hippocampus across all mammalian species, and to be a fundamental ingredient of its computations. The gradual anticipation or precession of the spikes of a cell as it traverses its place field, relative to the theta phase, is seen as enabling the prediction of the future - the short-term future position of the animal at least, possibly the future in a wider cognitive sense as well, in particular with humans. The present study shows that, under suitable conditions, place cell population activity "sweeps" to encode future positions, and sometimes past ones as well, even in the absence of theta, as a result of the interplay between firing rate adaptation and precise place coding in the afferent inputs, which tracks the real position of the animal. The core strength of the paper is the clarity afforded by the simple, elegant model. It allows the derivation (in a certain limit) of an analytical formula for the frequency of the sweeps, as a function of the various model parameters, such as the time constants for neuronal integration and for firing rate adaptation. The sweep frequency turns out to be inversely proportional to their geometric average. The authors note that, if theta oscillations are added to the model, they can entrain the sweeps, which thus may superficially appear to have been generated by the oscillations.

      1. The main weakness of the study is the other side of the simplicity coin. In its simple and neat formulation, the model envisages stereotyped single unit behavior regulated by a few parameters, like the two time constants above, or the "adaptation strength", the "width of the field" or the "input strength", which are all assumed to be constant across cells. In reality, not only assigning homogeneous values to those parameters seems implausible, but also describing e.g. adaptation with the simple equation included in the model may be an oversimplification. Therefore, it remains important to understand to what extent the mechanism envisaged in the model is robust to variability in the parameters or to eg less carefully tuned afferent inputs.

      Thank you for pointing out this important question. As the reviewer pointed out, there is an oversimplification in our model compared to the real hippocampal circuits (also see Q1 and Q3 from reviewer2). We also pointed out that in the main text line 504:

      “…Nevertheless, it is important to note that the CANN we adopt in the current study is an idealized model for the place cell population, where many biological details are missed. For instance, we have assumed that neuronal synaptic connections are translation-invariant in the space...”

      To investigate model robustness to parameter setting, we divided all the parameters into two groups. The first group of parameters determines the bump state, i.e., width of the field a, neuronal density ρ, global inhibition strength k, and connection strength J_0. The second group of parameters determines the bump sweep state (which based on the existence of the bump state), i.e., the input strength α and the adaptation strength m. For the first group of parameters, we refer the reviewer to the Method part: stability analysis of the bump state. This analysis tells us the condition when the continuous attractor state holds in the network (see Eq. 20, which guides us to perform parameter selection). For the second group of parameters, we refer the reviewer to Fig. 2g, which tells us when the bump sweep state occurs regarding to input strength and adaptation strength. When the input strength is small, the range of adaptation strength is also small (to get the bump sweep state). However, as the input strength increases, we can see from Fig. 2g that the range of adaptation strength (to get the bump sweep state) also linearly increases. Although there exists other two state in the network when the two parameters are set out of the colored area in Fig. 2g, the parameter range of getting sweep state is also large, especially when the input strength value is large, which is usually the case when the animal actively runs in the environment.

      To demonstrate how the variability affect the results, we added variability to the connection weights by sampling the connection weights from a Gaussian distribution around J_0 (this introduces heterogeneity in the connection structure). We found that the bump sweep state still holds in this condition (see Fig. S8 as well as Q1 from reviewer2). For the variability in other parameter values, the results will be similar. Although adding variability to these parameters will not bring us difficulty in numerical simulation, it will make the theoretical analysis much more difficult.

      1. The weak adaptation regime, when firing rate adaptation effectively moves the position encoded by population activity slightly ahead of the animal, is not novel - I discussed it, among others, in trying to understand the significance of the CA3-CA1 differentiation (2004). What is novel here, as far as I know, is the strong adaptation regime, when the adaptation strength m is at least larger than the ratio of time constants. Then population activity literally runs away, ahead of the animal, and oscillations set in, independent of any oscillatory inputs. Can this really occur in physiological conditions? A careful comparison with available experimental measures would greatly strengthen the significance of this study.

      Thank you for raising up this interesting question.

      Re: “…firing rate adaptation effectively moves the position encoded by population activity slightly ahead of the animal, is not novel…”, We added Treves, A (2004) as a citation when we introduce the firing rate adaptation in line 116

      To test if the case of “…the adaptation strength m is at least larger than the ratio of time constants…” could occur in physiological conditions, it requires a measure of the adaptation strength as well as the time constant of both neuron firing and adaptation effect. The most straightforward way would be in vivo patch clamp recording of hippocampal pyramidal neurons when the animal is navigating an environment. This will give us a direct measure of all these values. However, we don’t have these data to verify this hypothesis yet. Another possible way of measure these values is through a state-space model. Specifically, we can build a state space model (considering adaptation effect in spike release) by taking animal’s position as latent dynamics, and recorded spikes as observation, then infer the parameters such as adaptation strength and time constant in the slow dynamics. Previous work of state-space models (without firing rate adaptation) in analyzing theta sweeps and replay dynamics have been explored by Denovellis et al. (2021), as well as Krause and Drugowitsch (2022). We think it might be doable to infer the adaptation strength and adaptation time constant in a similar paradigm in future work. We thank the reviewer for pointing out that and hope our replies have clarified the concerns of the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors focused on genetic variability in relation to insulin resistance. They used genetically different lines of mice and exposed them to the same diet. They found that genetic predisposition impacts the overall outcome of metabolic disturbances. This work provides a fundamental novel view on the role of genetics and insulin resistance.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, van Gerwen et al. perform deep phosphoproteomics on muscle from saline or insulin-injected mice from 5 distinct strains fed a chow or HF/HS diet. The authors follow these data by defining a variety of intriguing genetic, dietary, or gene-by-diet phosphor-sites that respond to insulin accomplished through the application of correlation analyses, linear mixed models, and a module-based approach (WGCNA). These findings are supported by validation experiments by intersecting results with a previous profile of insulin-responsive sites (Humphrey et al, 2013) and importantly, mechanistic validation of Pfkfb3 where overexpression in L6 myotubes was sufficient to alter fatty acid-induced impairments in insulin-stimulated glucose uptake. To my knowledge, this resource provides the most comprehensive quantification of muscle phospho-proteins which occur as a result of diet in strains of mice where genetic and dietary effects can be quantifiably attributed in an accurate manner. Utilization of this resource is strongly supported by the analyses provided highlighting the complexity of insulin signaling in muscle, exemplified by contrasts to the "classically-used" C57BL6/J strain. As it stands, I view this exceptional resource as comprehensive with compelling strength of evidence behind the mechanism explored. Therefore, most of my comments stem from curiosity about pathways within this resource, many of which are likely well beyond the scope of incorporation in the current manuscript. These include the integration of previous studies investigating these strains for changes in transcriptional or proteomic profiles and intersections with available human phospho-protein data, many of which have been generated by this group.

      Strengths:

      Generation of a novel resource to explore genetic and dietary interactions influencing the phospho-proteome in muscle. This is accompanied by the elegant application of in silico tools to highlight the utility.

      Weaknesses:

      Some specific aspects of integration with other data among the same fixed strains could be strengthened and/or discussed.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to investigate how genetic and environmental factors influence the muscle insulin signaling network and its impact on metabolism. They utilized mass spectrometry-based phosphoproteomics to quantify phosphosites in the skeletal muscle of genetically distinct mouse strains in different dietary environments, with and without insulin stimulation. The results showed that genetic background and diet both affected insulin signaling, with almost half of the insulin-regulated phosphoproteome being modified by genetic background on an ordinary diet, and high-fat high-sugar feeding affecting insulin signaling in a strain-dependent manner.

      Strengths:

      The study uses state-of-the-art phosphoproteomics workflow allowing quantification of a large number of phosphosites in skeletal muscle, providing a comprehensive view of the muscle insulin signaling network. The study examined five genetically distinct mouse strains in two dietary environments, allowing for the investigation of the impact of genetic and environmental factors on insulin signaling. The identification of coregulated subnetworks within the insulin signaling pathway expanded our understanding of its organization and provided insights into potential regulatory mechanisms. The study associated diverse signaling responses with insulin-stimulated glucose uptake, uncovering regulators of muscle insulin responsiveness.

      Weaknesses:

      Different mouse strains have huge differences in body weight on normal and high-fat high-sugar diets, which makes comparison between the models challenging. The proteome of muscle across different strains is bound to be different but the changes in protein abundance on phosphosite changes were not assessed. Authors do get around this by calculating 'insulin response' because short insulin treatment should not affect protein abundance. The limitations acknowledged by the authors, such as the need for larger cohorts and the inclusion of female mice, suggest that further research is needed to validate and expand upon the findings.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest further discussion of the potential differences between males and females of the various strains.

      In the revised manuscript we have included a more detailed discussion of the potential differences between male and female mice in the "Limitations of this study" section on lines 455-459. In particular, a landmark study of HFD-fed inbred mouse strains found that insulin sensitivity, as inferred from the proxy HOMA-IR, was affected by interactions between sex and strain despite generally being greater in female mice (10.1016/j.cmet.2015.01.002). Furthermore, a recent phosphoproteomics study of human induced pluripotent stem-cell derived myoblasts identified groups of insulin-regulated phosphosites affected by donor sex, and by interactions between sex and donor insulin sensitivity (10.1172/JCI151818). Based on these results, we anticipate that both soleus insulin sensitivity and phoshoproteomic insulin responses would differ between male and female mice through interactions with strain and diet, adding yet another layer of complexity to what we observed in this study. This will be an important avenue for future research to explore.

      Reviewer #2 (Recommendations For The Authors):

      The following are comments to authors - many, if not all are suggestions for extended discussion and beyond the scope of the current elegant study.

      In the discussion section (line 428) the authors make a key point in that the genetic, dietary, and interacting patterns of variation of Phospho-sites could be due to changes in total protein and/or transcript levels across strains. For example, given the increased expression of Pfkfb3 was sufficient to impact glucose uptake, suggesting that the transcript levels of the gene might also show a similar correlation with insulin responsiveness as in Fig 6b. Undoubtedly, phospho-proteomics analyses will provide unique information on top of more classical omics layers and uncover what would be an important future direction. Therefore, I would suggest adding to the discussion some guidance on performing similar applications to datasets from, at least some, of the strains used where RNA-seq and proteomics are available.

      We thank the reviewer for this suggestion. To address this, we mined recently published total proteomics data collected from soleus muscles of seven CHOW or HFD-fed inbred mouse strains, three of which were in common with our study (C57Bl6J, BXH9, BXD34; 10.1016/j.cmet.2021.12.013). In this study ex vivo soleus glucose uptake was measured and correlation analysis was performed, so we directly extracted the resulting glucose uptake-protein associations and compared them to the glucose uptake-phosphoprotein associations identified in our study. Indeed, we found that only a minority of proteins correlated at both the phosphosite and total protein levels, highlighting the utility of phosphoproteomics to provide orthogonal information to more classical omics layers. We have included this analysis in lines 303-311.

      Relevant to this, the authors might want to consider depositing scripts to analyze some aspects of the data (ex. WGCNA on P-protein data or insulin-regulated anova) in a repository such as github so that these can be applied easily to other datasets.

      We refer the reviewer to the section "Code availability" on lines 511-513, where we deposited all code used to analyse the data on github.

      In contrast to the points above, I feel that the short time-course of insulin stimulation was one important aspect of the experimental design that was not emphasized enough as a strength. It was mentioned as a limitation in that other time points could provide more info, yes. But given that the total abundance of proteins and transcripts likely doesn't shift tremendously in this time frame, this provides an important appeal to the analysis of phosphor-proteomic data. I would suggest highlighting the insulin-stimulated response analysis here as something that leverages the unique nature of phosphoproteomics.

      We are grateful for the reviewer's positivity regarding this aspect of our experimental design. We have reiterated the value of the 10min insulin stimulation - that it temporally segregates phosphoproteomic and total proteomic changes - in the "Limitations of this study" section on lines 477-481.

      While I recognize the WGCNA analysis as an instrumental way to highlight global patterns of phospo-peptide abundance co-regulation, the analysis currently seems somewhat underdeveloped. For example, Fig 5f-h shows a lot of overlap between kinase substrates and pathways among modules. Clearly, there are informative differences based on the intersection with Humphries 2013 and the correlation with Pfkbp3. To highlight the specific membership of these modules, most people rank-order module members by correlation with eigen-gene (or P-peptide) and then perform pathway enrichments on these. Alternatively, it looks like all data was used to generate modules across conditions. One consideration would be to perform WGCNA on relevant comparison data separately (ex. chow mice only and HFHS only) and then compare modules whose membership is retained or shift between the two. Or even look at module representation for genes that show large correlations with insulin-responsiveness. This might also be a good opportunity to suggest readers intersect module members with muscle eQTLs which colocalize to glucose or insulin to prioritize some potential key drivers.

      We thank the reviewer for their helpful suggestions, which we feel have substantially improved the WGCNA analysis. To probe specific functional differences between subnetworks, we performed rank-based enrichment using phosphopeptide module membership scores. Interestingly, this did reveal pathways that were enriched only in certain modules. However, we found that after p-value adjustment, virtually all enriched pathways lost statistical significance, hence we interpret these results as suggestive only. We have made this analysis available to readers in Fig S4b-d and lines 263-265: "To further probe functional differences we analysed phosphopeptide subnetwork membership scores, which revealed additional pathways enriched in individual subnetworks. However, these results were not significant after p-value adjustment and hence are suggestive only (Fig. S4b-d)". We also visualised module representation for glucose-uptake correlated phosphopeptides. This agreed with our existing analyis in Fig. 6f, where the eigenpeptides of modules V and I were correlated with glucose uptake (Fig. 6f). We have incorporated this new analysis in Fig. S6b-c and lines 324-325: "Examining the subnetwork membership scores for glucose-uptake correlated phosphopeptides also revealed a preference for clusters V and I, supporting this analysis (Fig. S6b-c)." Finally, in the discussion we have presented the integration of genetic data, such as muscle-specific eQTLs, as a future direction (lines 398-401): "Alternatively, one could overlap subnetworks with genetic information, such as genes associated with glucose homeostasis and other metabolic traits in human GWAS studies, or muscle-specific eQTLs or pQTLs genetically colocalised with similar traits, to further prioritise subnetwork-associated phenotypes and identify potential drivers within subnetworks."

      Have the authors considered using their heritability and GxE estimated for module eigenpeptides? To my knowledge, this has never been performed and might provide some informative information as the co-regulated P-protein structure occurs as a result of relevant contexts.

      In the revised manuscript we have now analysed eigenpeptides with the same statistical tests used to identify Strain and Diet effects in insulin-regulated phosphopeptides. We have displayed the statistical results in Fig. S4a, and have explicitly mentioned examples of StrainxDiet effects on lines 245-247: "For example, HFD-feeding attenuated the insulin response of subnetwork I in CAST and C57Bl6J strains (t-test adjusted p = 0.0256, 0.0365), while subnetwork II was affected by HFD-feeding only in CAST and NOD (Fig. 5e, Fig. S4a, t-test adjusted p = 0.00258, 0.0256)."

      The integration of modules with adipocyte phosphoproteomic data from the authors 2013 Cell metab paper seems like an important way to highlight the integration of this resource to define critical cellular signaling mechanisms. To assess the conservation of signaling mechanisms and relationships to additional key contexts (ex. exercise), the intersection of the insulin-stimulated P-peptides with human datasets generated by this group (ex. cell metab 2015, nature biotech 2022) seems like an obvious future direction to prioritize targets. Figure S3B shows a starting point for these types of integrations.

      To demonstrate the value of integrating our results with related phosphoproteomics data, we have incorporated the reviewer's advice of comparing insulin-regulated phosphosites to exercise-regulated phosphosites from Needham et. Nature Biotech 2022 and Hoffman et al. Cell Metabolism 2015. We identified a small subset of commonly regulated phosphosites (8 across all three studies). Given insulin and exercise both promote GLUT4 translocation, these sites may represent conserved regulatory mechanisms. This analysis is presented in Fig. S3d, Table S2, and lines 129-135: "In addition to insulin, exercise also promotes GLUT4 translocation in skeletal muscle. We identified a small subset of phosphosites regulated by insulin in this study that were also regulated by exercise in two separate human phosphoproteomics studies (Fig. S3d, Table S2, phosphosites: Eef2 T57 and T59, Mff S129 and S131, Larp1 S498, Tbc1d4 S324, Svil S300, Gys1 S645), providing a starting point for exploring conserved signalling regulators of GLUT4 translocation."

      For the Pfkfb3 overexpression system, are there specific P-peptides that are increased/decreased upon insulin stimulation? This might be an interesting future direction to mention in order to link signaling mechanisms.

      We assessed whether canonical insulin signalling was affected by Pfkfb3 overexpression by immunoblotting. Insulin-stimulated phosphorylation of Akt S473, Akt T308, Gsk3a/b S21/S9, and PRAS40 T246 differed little across conditions, with only a weak, statistically insignificant trend towards increased pT308 Akt, pS21/S9 Gsk3a/b, and pT246 PRAS40 in palmitate-treated Pfkfb3-overexpressing cells. Hence, as the reviewer has suggested, an interesting future direction will be to perform phosphoproteomics to characterise more deeply the effects of palmitate and Pfkfb3 overexpression on insulin signalling. We have modified the manuscript to reflect these findings and suggested future directions on lines 362-365: "immunoblotting of canonical insulin-responsive phosphosites on Akt and its substrates GSK3α/β and PRAS40 revealed minimal effect of palmitate treatment and Pfkfb3 overexpression (Fig. S7e-f), hence more detailed phosphoproteomics studies are needed to clarify whether Pfkfb3 overexpression restored insulin action by modulating insulin signalling."

      Reviewer #3 (Recommendations For The Authors):

      This remarkable contribution by the esteemed research group has significantly enriched the field of metabolism. The extensive dataset, intertwined with a sophisticated research design, promises to serve as an invaluable resource for the scientific community. I offer a series of suggestions aimed at potentially elevating the manuscript to an even higher standard.

      Mouse Weight Variation and Correlation Analysis: The pronounced variances in mouse body weights pose a challenge to meaningful comparisons (Fig S1). Could the disparities in the phosphoproteome between basal and insulin-stimulated conditions be attributed to differences in body weight? Consider performing a correlation analysis. Furthermore, does the phosphoproteome of these mouse strains evolve comparably over time? Do these mice age similarly? Kindly incorporate this information.

      We thank the reviewer for the suggested analysis. We found there was a significant correlation between the phosphopeptide insulin response and mouse body weight, either in CHOW-fed mice (Strain effects) or across both diets (Diet effects), for ~ 25% of phosphopeptides that exhibited a Strain or Diet effect. Hence, while there is a clear effect of body weight on insulin signalling, this influences only a small proportion of the entire insulin-responsive phosphoproteome. Notably, insulin was dosed according to mouse lean mass to ensure equivalent dosage received by the soleus muscle, hence any insulin signalling differences associated with body weight are unlikely due to differences in dosing. As the reviewer also alludes to, different strains could have different lifespans. This may result in mice having different biological ages at the time of experimentation, and this in turn could influence insulin signalling. This possibility is challenging to assess in a quantitative manner because lifespan data is not available for most strains used. However, it is worth noting that female CAST mice live 77% as long as C57Bl6J mice (median age of 671 vs 866 (10.1073/pnas.1121113109); data is not available for male mice nor the other three strains), and substantial differences in insulin signalling were observed between these two strains. Ultimately, regardless of whether body weight and/or lifespan altered insulin signalling, such differences would still have arisen solely from the distinct genetic backgrounds and diets of the mice, hence we believe they are meaningful results that should not be dismissed. We have added this analysis to the revised manuscript in the "Limitations of this study" section on lines 471-477: "We were also unable to determine the extent to which signalling changes arose from muscle-intrinsic or extrinsic factors. For instance, body weight varied substantially across mice and correlated significantly with 25% of Strain and Diet-affected phosphopeptides (Fig. S8c), suggesting obesity-related systemic factors likely impact a subset of the muscle insulin signalling network. Furthermore, genetic differences in lifespan could alter the “biological age” of different strains and their phosphoproteomes, though we could not assess this possibility since lifespan data are not available for most strains used. "

      Soleus Muscle Data and Bias Considerations: Were measurements taken for lean mass and soleus muscle weight? If so, please present the corresponding data.

      Measurements for lean mass and the mass of soleus muscle after grinding have been including in Supplementary Figure S1 (panels c-d)

      As outlined in the methods section, the variation in protein yield from the soleus muscle across each strain is substantial. Notably, the distinct peptide input for phospho enrichment introduces biases, given that muscles with lower input may exhibit reduced identification (Fig S2). This bias might also manifest in the PCA plot (S2C). Ideally, adopting a uniform protein/peptide input would have been advantageous. Address this concern and contemplate moving the PCA plot to the main figure. It's prudent to reconsider the sentence stating, "Samples from animals of the same strain and diet were highly correlated and generally clustered together, implying the data are highly reproducible (Fig. S2b-d)," particularly if the input and total IDs were not matched.

      The reviewer highlights an important point. As the reviewer comments, it would have been our preference to use the same amount of protein material for all samples. However, as there was a wide range in the mass of the soleus muscle across mouse strains (in particular much lower in CAST mice), it was not appropriate to use the same amount of material for all strains. This is indeed evident in the PCA plot (Figure S2c), whereby samples cluster in the second component (PC2) based on the amount of protein material. However, this clustering is not observed in the hierarchical clustering (Figure S2d), and nor are the number of phosphopeptides quantified in each sample substantially impacted by these differences (Figure S2a) as implied by the reviewer. Indeed, the number of phosphopeptides quantified did not noticeably vary when comparing BXH9/BXD34 to C57Bl6J/NOD despite 32.3% less material used, and there were only 12.4% fewer phosphopeptides (average #13891.56 vs 15851.29) in CAST compared to C57Bl6J/NOD strains, despite 51.8% less material used. To further emphasise the minimal effect that input material had on phosphopeptide quantification, we have additionally plotted the number of phosphopeptides quantified in each sample following the filtering steps we employed prior to statistical analysis of the dataset (i.e. ANOVA). This plot (Author response image 1) shows that there is even less variation in the number of quantified phosphopeptides between strains, with only 9.12% fewer phosphopeptides quantified and filtered on average in CAST compared to C57Bl6J/NOD (average #9026.722 vs 9932.711). From a quantitative perspective, in both the PCA (Principal Component 1) and hierarchical clustering analyses, samples are additionally clustered by individual strains, and in the latter they also cluster generally by diet, implying that biological variation between samples remains the primary variation captured in our data. We have modified the manuscript so that these observations are forefront (lines 103-106): "Furthermore, while different strains clustered by the amount of protein material used in the second component of the PCA (Figure S2c), samples from animals of the same strain and diet were highly correlated and generally clustered together, indicating that our data are highly reproducible". To ensure that readers are aware of our decision to alter protein starting material and its implications, we have moved the description of this from the methods to the results, and we have highlighted the impact on phosphopeptide quantification in CAST mice (lines 99-103): "Due to the range in soleus mass across strains (Fig. S1D) we altered the protein material used for EasyPhos (C57Bl6J and NOD: 755 µg, BXH9 and BXD34: 511 µg, CAST: 364 µg), though phosphopeptide quantification was minimally affected, with only 12.4% fewer phosphopeptides quantified on average in CAST compared to the C57lB6J/NOD (average 13891.56 vs 15851.29 Fig. S2a)."

      Author response image 1.

      Phosphopeptide quantification following filtering. a) The number of phosphopeptides quantified in each sample after filtering prior to statistical analysis.

      Phosphosite Quantification Filtering: The quantified phosphosites have been dropped from 23,000 to 10,000. Could you elucidate the criteria employed for filtering and provide a concise explanation in the main text?

      We thank the reviewer for drawing this ambiguity to our attention. Before testing for insulin regulation, we performed a filtering step requiring phosphopeptides to be quantified well enough for comparisons across strains and diets. Specifically, phosphopeptides were retained if they were quantified well enough to assess the effect of insulin in more than eight strain-diet combinations (≥ 3 insulin-stimulated values and ≥ 3 unstimulated values in each combination). We have now included this explanation of the filtering in the main text on lines 108-114.

      ANOVA Choice Clarification: In Figure 4, there's a transition from one-way ANOVA in B to two-way ANOVA in C. Could you expound on the rationale for selecting these distinct methods?

      In panel B, we first focussed on kinase regulation differences between strains in the absence of a dietary perturbation. Hence, we performed one-way ANOVAs only within the CHOW-fed mice. In panel C, we then consider the effect of perturbation with the HFD. We perform two-way ANOVAs, allowing us to identify effects of the HFD that are uniform across strains (Diet main effect) or variable across strains (Strain-by-diet interaction).

      Cell Line Selection for Functional Experiments: Could you elucidate the rationale behind opting for L6 cells of rat origin over C2C12 mouse cells for functional experiments?

      We acknowledge that C2C12 cells have the benefit of being of mouse origin, which aligns with our mouse-derived phosphoproteomics data. However, they are unsuitable for glucose uptake experiments as they lack an insulin-responsive vesicular compartment even upon GLUT4 overexpression, and undergo spontaneous contraction when differentiated resulting in confounding non-insulin dependent glucose uptake (10.1152/ajpendo.00092.2002, 10.1007/s11626-999-0030-8). In contrast, L6 cells readily express insulin-responsive GLUT4, and cannot contract (doi.org/10.1113/JP281352, 10.1007/s11626-999-0030-8). Therefore they are a superior model for studying insulin-dependent glucose transport. We have added a justification of L6 cells over C2C12 cells in the revised manuscript, on lines 352-354: "While L6 cells are of rat origin, they are preferable to the popular C2C12 mouse cell line since the latter lack an insulin-responsive vesicular compartment and undergo spontaneous contraction, resulting in confounding non-insulin dependent glucose uptake."

      It's intriguing that while a phosphosite was modulated on Pfkfb2, functional assays were conducted on a different isoform (Pfkfb3) wherein the phosphosite was not detected.

      The correlation between Pfkfb2 S469 phosphorylation and insulin-stimulated glucose uptake suggests that F2,6BP production, and subsequent glycolytic activation, positively regulate insulin responsiveness. There are several ways of testing this: 1) Knock down endogenous Pfkfb2, and re-express either wild-type protein or a S469A phosphomutant. If S469 phosphorylation positively regulates insulin responsiveness, then knockdown should decrease insulin responsiveness and re-expression of wild-type Pfkfb2, but not S469A, should restore it. 2) Induce insulin resistance (e.g. through palmitate treatment), and overexpress phosphomimetic S469D or S469E Pfkfb2 to enhance F2,6BP production. Under our hypothesis, this should reverse insulin resistance. 3) There is some evidence that dual phosphorylation of S469 and S486, another activating phosphosite on Pfkfb2, enhances F2,6BP production through 14-3-3 binding (10.1093/emboj/cdg363). Hence, we may expect that introduction of an R18 sequence into Pfkfb2, which causes constitutive 14-3-3 binding (10.1074/jbc.M603274200), would increase Pfkfb2-driven F2,6BP production, and under our hypothesis this should reverse insulin resistance. 4) The paralog Pfkfb3 lacks Akt regulatory sites and has substantially higher basal activity than Pfkfb2. Thus, overexpression of Pfkfb3 should mimic the effect of phosphorylated Pfkfb2, and hence reverse insulin resistance under our hypothesis. While approaches 1), 2), and 3) directly target Pfkfb2, they have drawbacks. For example, 1) may not work if Pfkfb2 knockdown is compensated for by other Pfkfb isoforms, 2) may not work since D/E phosphomimetics often do not recapitulate the molecular effects of S/T phosphorylation (10.1091/mbc.E12-09-0677), and 3) may not work if S469 phosphorylation does not operate through 14-3-3 binding. Hence we performed 4) as it seemed to be the most robust and cleanest experiment to test our hypothesis. We have revised the manuscript to further clarify the challenges of directly targeting Pfkfb2 and the benefits of targeting Pfkfb3 on lines 342-349: "Since Pfkfb2 requires phosphorylation by Akt to produce F2,6BP substantially, increasing F2,6BP production via Pfkfb2 would require enhanced activating site phosphorylation, which is difficult to achieve in a targeted fashion, or phosphomimetic mutation of activating sites to aspartate/glutamate, which often does not recapitulate the molecular effects of serine/threonine phosphorylation. By contrast, the paralog Pfkfb3 has high basal production rates and lacks an Akt motif at the corresponding phosphosites. We therefore rationalised that overexpressing Pfkfb3 would robustly increase F2,6BP production and enhance glycolysis regardless of insulin stimulation and Akt signalling."

      Insulin-Independent Action of Pfkfb3: The functionality of Pfkfb3 unfolds in an insulin-independent manner, yet it restores insulin action (Fig 6h). Could you shed light on the mechanism underpinning this phenomenon? Consider measuring F2,6BP concentrations or assessing kinase activity upon overexpression.

      Pfkfb3 overexpression increased the glycolytic capacity of L6 myotubes in the absence of insulin stimulation, as inferred by extracellular acidification rate (Fig. S7c). This is indeed consistent with Pfkfb3 enhancing glycolysis through increased F2,6BP concentration in an insulin-independent manner. To shed light on the mechanism connecting this to insulin action, we performed immunoblotting experiments to assess the kinase activity of Akt, a master regulator of the insulin response. Indeed, this experimental direction has precedent as we previously observed that Pfkfb3 overexpression enhanced insulin-stimulated Akt signalling in HEK293 cells, while small-molecule inhibition of Pfkfb kinase activity reduced Akt signalling in 3T3-L1 adipocytes (10.1074/jbc.M115.658815). However, insulin-stimulated phosphorylation of Akt S473, Akt T308, Gsk3a/b S21/S9, and PRAS40 T246 differed little across conditions, with only a weak, statistically insignificant trend towards increased pT308 Akt, pS21/S9 Gsk3a/b, and pT246 PRAS40 in palmitate-treated Pfkfb3-overexpressing cells. Hence, a more detailed phosphoproteomics study will be needed to assess whether Pfkfb3 restores insulin action by modulating insulin signalling. We have described these immunoblotting experiments in lines 361-365 and Fig. S7e-f. We also discussed potential mechanisms through which Pfkfb3-enhanced glycolysis could connect to insulin action in the discussion (lines 427-434).

      Figure 6h Statistical Analysis: For the 2DG uptake in Figure 6h, a conventional two-way ANOVA might be more appropriate than a repeated measures ANOVA.

      On reflection, we agree that a conventional ANOVA is more appropriate. Furthermore, for simplicity and conciseness we have decided to analyse and present only insulin-stimulated/unstimulated 2DG uptake fold change values in Figure 6h. We have presented all unstimulated and insulin-stimulated values in Figure S7d.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for overseeing the assessment of our manuscript, “Comprehensive mutagenesis maps the effect of all single codon mutations in the AAV2 rep gene on AAV production". We would also like to thank the reviewers for their feedback. We have carried out the suggested experiments that we feel are most central to our conclusions and summarized the revisions to the manuscript below.

      We appreciate the reviewers’ suggestion with regards to testing different rAAV genomes. We have measured the effect of Rep variants on the production of rAAV containing three additional genomes: a 4.4 kb single-stranded genome, a 3.9 kb single-stranded genome, and a 2.1 kb self-complementary genome (Figures 5C and 5D). The DNase-resistant particles titers - reported as a percent of wild-type Rep titers - are relatively consistent across these three constructs as well as the 5.0 kb single-stranded genome previously tested.

      We agree with the reviewers that measurement of the relative transduction efficiency of rAAV produced with different Rep variants is an important experiment to conduct. To address this, we transduced HEK293T cells with rAAVs, containing a luciferase genome, which were produced using two different Rep variants. When a constant volume of purified rAAV was used for transduction, we observed that the rAAV produced with the S110R Rep variant resulted in higher transduction than rAAV produced with wild-type Rep (as measured by luciferase signal). While we tested only a small number of variants, these results indicate that at least one of the Rep variants we identified can increase not only the viral genome titer but also the titer of transducing particles.

      To generate this transduction data, we produced additional rAAV preps using S110R and Q439T Rep variants. In the previous version of this manuscript, we used the Q439T variant to produce rAAV and noted a 10% increase in the ratio of viral genomes: capsids as determined by comparison of qPCR and capsid ELISA titers. However, a similar increase was not observed in the more recent experiment discussed above. We attribute this discrepancy to changes in the plasmid quantification methods used for transfection. Previously, we quantified plasmids using a fluorometric assay (Qubit); in our more recent experiments, we used qPCR to quantify plasmids for transfection. qPCR provides a more accurate measurement of plasmid concentration due to the specific nature of the primers and probes used, which may account for the subtle shift in quantification. While outside the scope of the current work, it will also be interesting to further investigate the proportion of full capsids using additional Rep variants and more direct methods, such as cryoEM or analytical ultracentrifugation.

      We agree with the reviewers’ observation that there are differences in the production fitness values for synonymous variants. However, the variation in production fitness values between synonymous variants is smaller than that between non-synonymous variants. We conducted the following analysis to clarify this point. We calculated two mean centered fitness values for each codon variant in the WT AAV2 library. The “positional mean centered fitness value” was determined using the production fitness values of all variants at a given amino acid position and describes how far a given fitness value diverges from the mean fitness value for that position. The “synonymous codon mean centered fitness value” was determined using the production fitness values of all synonymous variants at a given position and describes how far a given fitness value diverges from the mean fitness value for all its synonymous codon variants. We then plotted both mean centered fitness values versus amino acid position (Figure S8).

      The distribution of mean centered selection values is narrower when calculated at the synonymous codon level as opposed to the position level. This indicates that, in general, synonymous variants have more tightly distributed production fitness values than non-synonymous variants. This observation precludes us from conducting a more thorough analysis of the effects of synonymous codons on AAV production. (Although, there is at least one instance where clear differences between synonymous codons can be observed (Figure S9C and Figure S9D).) We agree with the reviewers that synonymous variants almost certainly influence aspects of AAV production, such as genome replication, transcriptional regulation, mRNA stability, and protein expression. However, our assay measures the aggregate effect of rep variants on all steps in the AAV production process and is likely unable to detect the effects of synonymous variants on specific steps in this process if those steps are not rate-limiting. We have updated the discussion section to include an explanation of the above.

      The X-axes in Figures 5B and 5D have been updated to plot s’ instead of percent WT titer. We have also added asterisks to indicate significance in Figures 5A and 5C. Thank you for these suggestions.

      We agree with Reviewer 3 that it would be interesting to sequence barcodes from the mRNA pool. The 20 bp barcodes are located upstream of the polyA site and should be present in mRNA transcripts. Something to consider is that AAV2 transcripts expressed from all three promoters (p5, p19, and p40) are polyadenylated at the same site (Stutika et al., 2016). As such, in our WT AAV2 library, barcode representation in the mRNA pool would indicate the aggregate effect of a rep variant on the levels of all AAV2 transcripts. In the pCMV-Rep78/68 library, only two AAV2 transcripts are generated - a spliced and unspliced version of the p5 product. Sequencing of barcodes present in the mRNA pool could be informative regarding the effect of rep variants on combined Rep78/68 expression levels. However, we feel that this experiment is outside the scope of the current work.

      We were also surprised at the number of novel functional Rep variants that were identified in our library. As the reviewer pointed out, optimal rAAV production likely does not equate to optimal fitness of naturally occurring AAV in the endogenous host. Naturally occurring AAV has both a latent and a lytic cycle and the Rep proteins play a role in both these processes (Pereira et al., 1997; Surosky et al., 1997). rAAV production, however, is primarily analogous to the lytic cycle of naturally occurring AAV. In their endogenous hosts, AAV must balance the effect of any mutations on fitness in both the lytic and latent contexts while we assay specifically for production fitness. We additionally attribute this finding to the relatively small number of AAV serotypes, for which rep sequences are available. We have added a discussion of the above to the manuscript.

      Finally, in response to feedback from other researchers, we determined which amino acid substitutions resulted in production fitness values that were significantly different from that of wild-type (Figure S4). These results further emphasized the importance of the origin-binding domain; most statistically significant beneficial substitutions clustered here. Additionally, we noted that the majority of substitutions in the zinc-finger domain resulted in production fitness changes that were not significant. This lines up with previous work indicating that the zinc-finger domain is dispensable for rAAV production. We have added a discussion of these results to the main text.

      We again thank the reviewers for their suggestions; we feel that incorporation of their suggestions has strengthened support for our conclusions and enhanced the utility of this work for others in the field.

      References Pereira, D. J., McCarty, D. M., & Muzyczka, N. (1997). The adeno-associated virus (AAV) Rep protein acts as both a repressor and an activator to regulate AAV transcription during a productive infection. Journal of Virology, 71(2), 1079–1088. https://doi.org/10.1128/jvi.71.2.1079-1088.1997

      Stutika, C., Gogol-Döring, A., Botschen, L., Mietzsch, M., Weger, S., Feldkamp, M., Chen, W., & Heilbronn, R. (2016). A Comprehensive RNA Sequencing Analysis of the Adeno-Associated Virus (AAV) Type 2 Transcriptome Reveals Novel AAV Transcripts, Splice Variants, and Derived Proteins. Journal of Virology, 90(3), 1278–1289. https://doi.org/10.1128/JVI.02750-15

      Surosky, R. T., Urabe, M., Godwin, S. G., McQuiston, S. A., Kurtzman, G. J., Ozawa, K., & Natsoulis, G. (1997). Adeno-associated virus Rep proteins target DNA sequences to a unique locus in the human genome. Journal of Virology, 71(10), 7951–7959. https://doi.org/10.1128/jvi.71.10.7951-7959.1997

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors' finding that PARG hydrolase removal of polyADP-ribose (PAR) protein adducts generated in response to the presence of unligated Okazaki fragments is important for S-phase progression is potentially valuable, but the evidence is incomplete, and identification of relevant PARylated PARG substrates in S-phase is needed to understand the role of PARylation and dePARylation in S-phase progression. Their observation that human ovarian cancer cells with low levels of PARG are more sensitive to a PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation, suggests that low PARG protein levels could serve as a criterion to select ovarian cancer patients for treatment with a PARG inhibitor drug.

      Thank you for the assessment and summary. Please see below for details as we have now addressed the deficiencies pointed out by the reviewers.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      Public Reviews:

      Reviewer #1 (Public Review):

      I have a major conceptual problem with this manuscript: How can the full deletion of a gene (PARG) sensitize a cell to further inhibition by its chemical inhibitor (PARGi) since the target protein is fully absent?

      Please see below for details about this point. Briefly, we found that PARG is an essential gene (Fig. 7). There was residual PARG activity in our PARG KO cells, although the loss of full-length PARG was confirmed by Western blotting and DNA sequencing (Fig. S9). The residual PARG activity in these cells can be further inhibited by PARG inhibitor, which eventually lead to cell death.

      The authors state in the discussion section: "The residual PARG dePARylation activity observed in PARG KO cells likely supports cell growth, which can be further inhibited by PARGi". What does this statement mean? Is the authors' conclusion that their PARG KOs are not true KOs but partial hypomorphic knockdowns? Were the authors working with KO clones or CRISPR deletion in populations of cells?

      The reviewer is correct that our PARG KOs are not true KOs. We were working with CRISPR edited KO clones. As shown in this manuscript, we validated our KO clones by Western blotting, DNA sequencing and MMS-induced PARylation. Despite these efforts and our inability to detect full-length PARG in our KO clones, we suspect that our PARG KO cells may still express one or more active fragments of PARG due to alternative splicing and/or alternative ATG usage.

      As shown in Fig. 7, we believe that PARG is essential for proliferation. Our initial KO cell lines are not complete PARG KO cells and residual PARG activity in these cells could support cell proliferation. Unfortunately, due to lack of appropriate reagents we could not draw solid conclusions regarding the isoforms or the truncated PARG expressed in these cells (Please see Western blots below).

      Are there splice variants of PARG that were not knocked down? Are there PARP paralogues that can complement the biochemical activity of PARG in the PARG KOs? The authors do not discuss these critical issues nor engage with this problem.

      There are five reviewed or potential PARG isoforms identified in the Uniprot database. The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. However, it is likely that sgRNA-mediated genome editing may lead to the creation of new alternatively spliced PARG mRNAs or the use of alternative ATG, which can produce catalytically active forms of PARG. Instead of searching for these putative spliced PARG RNAs, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. These data clearly indicate that residual PARG activity are present and detected in our KO cells, but the precise nature of these truncated forms of PARG remains elusive.

      Author response image 1.

      These issues have to be dealt with upfront in the manuscript for the reader to make sense of their work.

      We thank this reviewer for his/her constructive comments and suggestions. We will include the data above and additional discussion upfront in our revised manuscript to avoid any further confusion by our readers.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Nie et al investigate the effect of PARG KO and PARG inhibition (PARGi) on pADPR, DNA damage, cell viability, and synthetic lethal interactions in HEK293A and Hela cells. Surprisingly, the authors report that PARG KO cells are sensitive to PARGi and show higher pADPR levels than PARG KO cells, which are abrogated upon deletion or inhibition of PARP1/PARP2. The authors explain the sensitivity of PARG KO to PARGi through incomplete PARG depletion and demonstrate complete loss of PARG activity when incomplete PARG KO cells are transfected with additional gRNAs in the presence of PARPi. Furthermore, the authors show that the sensitivity of PARG KO cells to PARGi is not caused by NAD depletion but by S-phase accumulation of pADPR on chromatin coming from unligated Okazaki fragments, which are recognized and bound by PARP1. Consistently, PARG KO or PARG inhibition shows synthetic lethality with Pol beta, which is required for Okazaki fragment maturation. PARG expression levels in ovarian cancer cell lines correlate negatively with their sensitivity to PARGi.

      Thank you for your nice comments. The complete loss of PARG activity was observed in PARG complete/conditional KO (cKO) cells. These cKO clones were generated using wild-type cells transfected with sgRNAs targeting the catalytic domain of PARG in the presence of PARP inhibitor.

      Strengths:

      The authors show that PARG is essential for removing ADP-ribosylation in S-phase.

      Thanks!

      Weaknesses:

      1. This begs the question as to the relevant substrates of PARG in S-phase, which could be addressed, for example, by analysing PARylated proteins associated with replication forks in PARG-depleted cells (EdU pulldown and Af1521 enrichment followed by mass spectrometry).

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      1. The results showing the generation of a full PARG KO should be moved to the beginning of the Results section, right after the first Results chapter (PARG depletion leads to drastic sensitivity to PARGi), otherwise, the reader is left to wonder how PARG KO cells can be sensitive to PARGi when there should be presumably no PARG present.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      1. Please indicate in the first figure which isoforms were targeted with gRNAs, given that there are 5 PARG isoforms. You should also highlight that the PARG antibody only recognizes the largest isoform, which is clearly absent in your PARG KO, but other isoforms may still be produced, depending on where the cleavage sites were located.

      The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends.

      The manufacturer instruction states that the Anti-PARG antibody (66564S) can only recognize isoform 1, this antibody could recognize isoforms 2 and 3 albeit weakly based on Western blot results with lysates prepared from PARG cKO cells reconstituted with different PARG isoforms, as shown below. As suggested, we will add a statement in the revised manuscript and provide the Western blotting data below.

      Author response image 2.

      To test whether other isoforms were expressed in 293A and/or HeLa cells, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Author response image 3.

      1. FACS data need to be quantified. Scatter plots can be moved to Supplementary while quantification histograms with statistical analysis should be placed in the main figures.

      We agree with this reviewer that quantification of FACS data may provide straightforward results in some of our data. However, it is challenging to quantify positive S phase pADPr signaling in some panels, for example in Fig. 3A and Fig. 4C. In both panels, pADPr signaling was detected throughout the cell cycle and therefore it is difficult to know the percentage of S phase pADPr signaling in these samples. Thus, we decide to keep the scatter plots to demonstrate the dramatic and S phase-specific pADPr signaling in PARG KO cells treated with PARGi. We hope that these data are clear and convincing even without any quantification.

      1. All colony formation assays should be quantified and sensitivity plots should be shown next to example plates.

      As suggested, we will include the sensitivity plot next to Fig. 3D. However, other colony formation assays in this study were performed with a single concentration of inhibitor and therefore we will not provide sensitivity plots for these experiments. Nevertheless, the results of these experiments are straightforward and easy to interpret.

      1. Please indicate how many times each experiment was performed independently and include statistical analysis.

      As suggested, we will add this information in the revised manuscript.

      Reviewer #3 (Public Review):

      Here the authors carried out a CRISPR/sgRNA screen with a DDR gene-targeted mini-library in HEK293A cells looking for genes whose loss increased sensitivity to treatment with the PARG inhibitor, PDD00017273 (PARGi). Surprisingly they found that PARG itself, which encodes the cellular poly(ADP-ribose) glycohydrolase (dePARylation) enzyme, was a major hit. Targeted PARG KO in 293A and HeLa cells also caused high sensitivity to PARGi. When PARG KO cells were reconstituted with catalytically-dead PARG, MMS treatment caused an increase in PARylation, not observed when cells were reconstituted with WT PARG or when the PARG KO was combined with PARP1/2 DKO, suggesting that loss of PARG leads to a strong PARP1/2-dependent increase in protein PARylation. The decrease in intracellular NADH+, the substrate for PARP-driven PARylation, observed in PARG KO cells was reversed by treatment with NMN or NAM, and this treatment partially rescued the PARG KO cell lethality. However, since NAD+ depletion with the FK868 nicotinamide phosphoribosyltransferase (NAMPT) inhibitor did not induce a similar lethality the authors concluded that NAD+ depletion/reduction was only partially responsible for the PARGi toxicity. Interestingly, PARylation was also observed in untreated PARG KO cells, specifically in S phase, without a significant rise in γH2AX signals. Using cells synchronized at G1/S by double thymidine blockade and release, they showed that entry into S phase was necessary for PARGi to induce PARylation in PARG KO cells. They found an increased association of PARP1 with a chromatin fraction in PARG KO cells independent of PARGi treatment, and suggested that PARP1 trapping on chromatin might account in part for the increased PARGi sensitivity. They also showed that prolonged PARGi treatment of PARG KO cells caused S phase accumulation of pADPr eventually leading to DNA damage, as evidenced by increased anti-γH2AX antibody signals and alkaline comet assays. Based on the use of emetine, they deduced that this response could be caused by unligated Okazaki fragments. Next, they carried out FACS-based CRISPR screens to identify genes that might be involved in cell lethality in WT and PARG KO cells, finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity, whereas loss of PARP1 had the opposite effects. They also found that BER pathway disruption exhibited synthetic lethality with PARGi treatment in both PARG KO cells and WT cells, and that loss of genes involved in Okazaki fragment ligation induced S phase pADPr signaling. In a panel of human ovarian cancer cell lines, PARGi sensitivity was found to correlate with low levels of PARG mRNA, and they showed that the PARGi sensitivity of cells could be reduced by PARPi treatment. Finally, they addressed the conundrum of why PARG KO cells should be sensitive to a specific PARG inhibitor if there is no PARG to inhibit and found that the PARG KO cells had significant residual PARG activity when measured in a lysate activity assay, which could be inhibited by PARGi, although the inhabited PARG activity levels remained higher than those of PARG cKO cells (see below). This led them to generate new, more complete PARG KO cells they called complete/conditional KO (cKO), whose survival required the inclusion of the olaparib PARPi in the growth medium. These PARG cKO cells exhibited extremely low levels of PARG activity in vitro, consistent with a true PARG KO phenotype.

      We thank this reviewer for his/her constructive comments and suggestions.

      The finding that human ovarian cancer cells with low levels of PARG are more sensitive to inhibition with a small molecule PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation (pADPr) that are toxic to cells is quite interesting, and this could be useful in the future as a diagnostic marker for preselection of ovarian cancer patients for treatment with a PARG inhibitor drug. The finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity is in keeping with the conclusion that PARG activity is essential for cell fitness, because it prevents excessive protein PARylation. The observation that increased PARylation can be detected in an unperturbed S phase in PARG KO cells is also of interest. However, the functional importance of protein PARylation at the replication fork in the normal cell cycle was not fully investigated, and none of the key PARylation targets for PARG required for S phase progression were identified. Overall, there are some interesting findings in the paper, but their impact is significantly lessened by the confusing way in which the paper has been organized and written, and this needs to be rectified.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      As suggested, we will revise our manuscript accordingly and provide additional explanation/statement upfront to avoid any misunderstandings.  

      Reviewer #1 (Recommendations For The Authors):

      1. Figure 1c. Why does the viability of PARG KO cells improve at higher doses of PARGi? How do the authors explain this paradox?

      This phenomenon was observed in 293A PARG KO cells and happened in CellTiter-Glo assay, especially with the top three PARGi concentrations (100 µM, 33.33 µM and 11.11 µM). This may due to the low solubility of this PARGi in the medium, since we sometimes observed precipitation at high concentrations when PARGi stock was diluted in medium.

      1. Figure 2d. The authors show that PARGi reduced NAD+ level by 20%. This reduction in NAD+ probably does not explain the cell death phenotype observed by parthanatos cell death. What pathway is activated by PARGi to induce cell death?

      Since PARG KO cells treated with PARGi led to uncontrolled pADPr accumulation, it is possible that some of these cells may die due to parthanotos. However, we did not observe a dramatic reduction in NAD+ level. A previous study showed that Parg(-/-) mouse ES cells predominantly underwent caspase-dependent apoptosis (Shirai et al., 2013). Indeed, PARP1 cleavage was detected in PARG KO cells with prolonged PARGi treatment, indicating that at least some of these cells die due to apoptosis (Fig. 2A). Cytotoxicity of PARGi in PARG KO cells may due to several mechanisms including apoptosis, parthanatos and NAD+ reduction.

      1. The authors refer to FK866 in the text without explaining what this agent is. FK866 is a noncompetitive inhibitor of nicotinamide phosphoribosyltransferase (NAPRT), a key enzyme in the regulation of NAD+ biosynthesis from the natural precursor nicotinamide. The authors should explain experimental tools in the text as they use them for clarity to the reader.

      Thanks for the suggestion! We will include additional citations and discuss how FK866 works in our revised manuscript.

      1. In addition to these issues, there are significant formatting and textual problems, such that there are multiple gaps in the body of the text that make coherent reading of the manuscript impossible. Examples are: Page 3 line 10. Page 6 line 5 and line 15, Page 7 line 2, 3, and line 8. Page 8, line 1, and line 3 from bottom. Page 9 line 1, line 7 from bottom and line 9 from the bottom, Page 18 of the results in several places, etc. etc. etc. These formatting errors convey the impression that the submitting authors did not adequately review the manuscript for technical problems prior to submission. The authors need to correct these errors.

      Sorry, we will edit the text and remove these gaps as suggested.

      Reviewer #3 (Recommendations For The Authors):

      1. The major problem with this paper is conceptual - namely, how could PARG knockout cells be hypersensitive to a selective PARG small molecular inhibitor. The evidence in Figure 7 that there is measurable residual PARG activity in the so-called PARG KO 293A and HeLa cells provides a partial explanation for why PARG inhibitor treatment might be deleterious to the PARG KO cells, i.e., because PARGi blocks this residual PARG activity. However, although the authors characterized the PARG alleles in the 293A PARG KO cells by sequencing, the molecular origin of the significant level of residual PARG activity remains unclear (see points 7-9).

      Yes, in our study we showed that PARGi treatment inhibited the residual PARG activity in PARG KO cells, which mimics complete loss of PARG as PARG is an essential gene. These data agree with a previous study using Parg(-/-) mouse cells (Koh et al., 2004).We attempted to define the molecular origin of the residual PARG activity, unfortunately this was challenging (please see below for additional discussions). Nevertheless, we showed that residual PARG activity could be detected in PARG KO cells and more importantly cells with reduced PARG expression or activity are sensitive to PARGi. These results indicate that PARG expression and/or activity may be used as a biomarker for PARGi-based therapy.

      1. Although the most obvious explanation for the PARGi sensitivity data presented in Figures 1-4 is that the PARG KO cells have residual PARG activity, the authors wait until the discussion on page 26 to raise the possibility that the PARG KO cells might have residual PARG activity that renders them sensitive to PARGi. It would be more logical to move the PARG activity data in Figure 7 earlier in the paper as a supplementary figure, so that the reader is not left wondering how a PARG KO cell remains sensitive to a PARG inhibitor. For this reason, it is recommended that the whole paper be reorganized and rewritten to provide a more logical flow that allows the reader to understand what was done, and why it is hard to generate complete PARG KO cells because the accumulation of pADPR adducts is toxic to the cell.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      1. Exactly how PARG activity would be coordinated with PARP1/2 activity during normal S phase to ensure that PARylation can serve its required function, whatever that may be, and is then removed by PARG is unclear - how would this be orchestrated at the level of a replication fork?

      PARG is known to be recruited to sites of DNA damage through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Our current hypothesis is that PARP1 is one of the major PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression. Precisely how PARG regulates S phase progression warrants further investigation.

      1. Figure 2B: What gRNAs were used to generate the 293A and HeLa PARG knock clones, i.e., where are they located in the PARG gene? If they are not in the catalytic domain it might be possible to generate PARG proteins with N-terminal deletions that are still active (see points 8-10 below).

      The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends to show the localization of gRNAs.

      We agree with this reviewer that truncated but active forms of PARG exist in these KO cells. We attempted to identify these trunated forms of PARG by using two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform/truncated form was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. Based on these results, we stated that the residual PARG activity was detected in our KO cells, but we were not able to specify the truncated variants of PARG in these cells.

      Author response image 4.

      1. Figure 3B/page 19: The authors state that "emetine, which diminishes Okazaki fragments, greatly inhibited S phase pADPr signaling in PARG KO cells", and from this deduced that Okazaki fragments on the lagging strand activate PARylation. However, emetine is not a specific lagging strand synthesis inhibitor, as implied here, but rather a protein synthesis inhibitor, which inhibits Okazaki fragment formation indirectly (see PMID: 36260751). The authors need to rewrite this section to explain how emetine works in this context.

      As suggested, we will cite this reference and discuss how emetine inhibits Okazaki fragment maturation in our revised manuscript. Additionally, we used three different POLA1 inhibitors to diminish Okazaki fragments. As shown in Fig. S3B, all three POLA1 inhibitors significantly abolished S-phase pADPr induced by PARGi in PARG KO cells. Furthermore, POLA1 inhibitors, adarotene and CD437, were able to rescue cell lethality caused by PARGi in PARG KO cells (Fig. 3E).

      1. Figure 7: It is not clear why these cells are called PARG complete/conditional KO cells (cKO). Generally, "conditional knockout" refers to a cell or animal in which a gene can be conditionally knocked out by inducible expression of Cre. Here, it appears that "conditional" refers to the fact that the PARG KO cells only grow in the presence of olaparib - is this the case?

      Yes, we used the name to separate these cells from our initial PARG KO cells. Moreover, we were only able to obtain and maintain these PARG cKO clones with complete loss of PARG activity in the presence of PARP inhibitor. Therefore, we called them PARG complete/conditional KO (cKO) cells.

      1. Figure 7B and D: The level of full-length PARG protein was much lower in the 293A and HeLa cKO cells compared to WT cells consistent with cKO cells representing a more complete PARG KO. The level of PARG protein in the 293A PARG cKO cells was apparently also lower than in the original PARG KO cells, but the KO and cKO samples should be run side by side to demonstrate this conclusively, and the bands need to be quantified. In panel B, it is not clear from the legend what cKO_3 and cKO_4 are, but presumably, they are different clones, and this should be stated.

      Full-length PARG was not detected in either PARG KO or PARG cKO cells by WB. The apparent lower level of endogenous PARG in Fig. 7D was due to the fact that reconstituted cells had high exogenous PARG expression and therefore we had to reduce exposure time for WB.

      As for cKO_3 and cKO_4 in Fig.7, they are different clones created by different sgRNAs. As suggested, we will include additional information in figure legends to clearly state which sgRNA was used to generate the respective KO and cKO clones.

      1. Figure S8: There is not enough information here or in the text to allow the reader to interpret these PARG allele sequences obtained from the PARG KO cells. From the Methods section, it appears that the PARG KO cells were clonal, with sequence data from one clone of each of the 293A and HeLa cell PARG KO cells being shown. If this is right, then in both cell types one out of four PARG alleles is wild type, and therefore one would expect the PARG protein signal to be ~25% of that in WT cells. However, based on the 293A PARG KO cells PARG immunoblot in Figure 2B the PARG protein signal is clearly much lower than 25% (these bands need to be quantified), and this discrepancy needs to be explained. What is the level of PARG protein in the PARG KO HeLa cells? If different PARG KO cell clones are analyzed by sequencing, do they all have an apparently intact PARG allele? Four different gRNA target sites in the PARG gene are shown in panel A in Figure 7, but the description in the text regarding how the four gRNAs were used is totally inadequate - were all four used simultaneously or only the two in the catalytic domain? Were pairs of gRNAs used in an attempt to generate a large intervening deletion - some Southern blots of the PARG gene region in the PARG cKO cells are needed to figure this out. The gRNAs are given numbers in Figure 7A, but it is unclear from the sequences shown in Figures S8 and S9 which gRNA sites are shown. All of this has to be clarified, so that the reader can understand the nature of the KO/cKO cells knockout alleles, and what PARG-related products, if any, they can express.

      Yes, all KO and cKO cells used in this study are single clones. As suggested, we will revise figure legends in Fig.7, S8 and S9 to include detailed information. To avoid any further misunderstanding, we will label the allele “WT” to “WT (reference)” in Fig. S8 and S9. We did not detect intact/wild-type PARG sequence in any single KO/cKO clone by DNA sequencing. Sequencing of single KO/cKO clones was performed by using TOP TA Cloning kit. Briefly, genomic DNA was extracted from each single KO/cKO clone. Approximately 300bp surrounding the sgRNA targeting sequence was amplified by PCR. The PCR product was cloned into the vector and approximately 10-15 bacteria clones were extracted and sent for sequencing. If any intact/wild-type PARG sequence was detected in these 10-15 bacteria clones, this KO/cKO clone was considered heterozygous clone and discarded.

      HEK293A and HeLa cells are not diploid cells and have complex karyotypes. PARG gene is located on chromosome 10. Karyotyping by M-FISH shows that HeLa cells have 3 copies of chromosome 10 (Landry et al., 2013). HEK293 cells predominantly have 3 copies of chromosome 10 and sometimes 4 copies can be detected by G-banding (Binz et al., 2019). Therefore, it is anticipated that 1 to 4 mutant alleles would be detected in each KO/cKO clone by sequencing.

      Only one sgRNA was transfected into cells for the selection of single clones. We did not use paired or multiple sgRNAs in any of these experiments. As shown in Fig. S1D and Fig. 7A, HEK293A derived and HeLa derived PARG KO single clones were generated with the use of different sgRNAs. In addition, the two PARG cKO single clones from HEK293A and HeLa cells were also generated by the use of two different sgRNAs, as shown in Fig. 7A-B. We will include all the information above in the revised manuscript, i.e. in Methods section as well as in figure legends.

      1. Figure S9A: The sequences of the 293A PARG alleles in the cKO cells suggest that these cells also have one intact PARG allele, which again does not fit with the very low level of intact PARG protein shown in Figure 7B. How do the authors explain this?

      Sorry, this is a misunderstanding. The allele “WT” in Fig. S8 and S9 is the reference sequence. We will change it to “Reference sequence” to avoid further confusion. As mentioned above, we did not detect any intact/wild-type PARG sequence in any of our single KO/cKO clones by sequencing.

      1. Figure S9B: These critical lysate activity data show that the PARG KO cells have ~50% of the PARG activity detected in WT cells. However, this is not consistent with the PARG protein level detected in PARG immunoblot in Figure 1B, which appears to be less than 5% of the PARG protein level in WT cells (with one intact PARG allele in these cells one would theoretically expect~ 25%, although this depends on whether all four alleles are expressed equally). One possibility is that active PARG fragments are generated from one or more of the PARG KO alleles in the PARG KO cells. Targeted sequencing of PARG mRNAs might reveal whether there are shorter RNAs that could encode a protein containing the C-terminal catalytic domain (aa 570-910). In addition, the authors need to show the entire immunoblot to determine if there are smaller proteins recognized by the anti-PARG antibodies that might represent shorter PARG gene products (for this we need to know where the epitope against which the PARG antibodies are directed are located within the PARG protein - ideally they authors need to use an antibody directed against an epitope near the C-terminus).

      As stated in the Methods section, we incubated cell lysates with substrates overnight to evaluate the maximum level of pADPr hydrolysis, i.e. PARG activity, we were able to detect in this assay. It is very likely that the PARG activity in PARG KO cells was much lower than 50%, due to saturation of signals for lysates isolated from wild-type cells. Thus, the data presented in our manuscript probably underestimate the reduction of PARG activity in PARG KO cells. Nevertheless, these data indicate that residual PARG activity was detected in PARG KO cells, however this activity was absent in PARG cKO cells.

      As aforementioned, we used two independent antibodies that recognize the C-terminus of PARG for WB. Unfortunately, we could not draw a clear conclusion which functional isoforms or truncated proteins were expressed in our PARG KO cells. The dePARylation assay used here may be the best way to test the residual PARG activity in our KO and cKO cells.

      1. Figure 7D: In this experiment, the level of re-expressed WT PARG protein was much higher than that of the endogenous PARG protein (quantification is needed) - how might this affect the interpretation of these experiments (N.B., WT and catalytically-dead PARG were also re-expressed for the experiments shown in Figure 1, but there are no PARG immunoblots to demonstrate how much the exogenous proteins were overexpressed, or activity measurements). If regulated pADPr signaling is important for a normal S phase, then one would have thought that expressing a very high level of active PARG would create problems.

      In Fig. S1E, we blotted endogenous PARG level in control cells and exogenous PARG level in reconstituted cells. The reviewer is correct that exogenous PARG expression was much higher (~10-fold) than that of endogenous PARG in WT control cells. Nevertheless, we did not observe any obvious phenotypes in PARG KO/cKO cells reconstituted with high level of exogeneous PARG, which may reflect excess PARG level/activity in wild-type control cells.

      References:

      Binz, R. L., Tian, E., Sadhukhan, R., Zhou, D., Hauer-Jensen, M., and Pathak, R. (2019). Identification of novel breakpoints for locus- and region-specific translocations in 293 cells by molecular cytogenetics before and after irradiation. Sci Rep 9, 10554.

      Hanzlikova, H., Kalasova, I., Demin, A. A., Pennicott, L. E., Cihlarova, Z., and Caldecott, K. W. (2018). The Importance of Poly(ADP-Ribose) Polymerase as a Sensor of Unligated Okazaki Fragments during DNA Replication. Mol Cell 71, 319-331 e313.

      Koh, D. W., Lawler, A. M., Poitras, M. F., Sasaki, M., Wattler, S., Nehls, M. C., Stoger, T., Poirier, G. G., Dawson, V. L., and Dawson, T. M. (2004). Failure to degrade poly(ADP-ribose) causes increased sensitivity to cytotoxicity and early embryonic lethality. Proc Natl Acad Sci U S A 101, 17699-17704.

      Kumamoto, S., Nishiyama, A., Chiba, Y., Miyashita, R., Konishi, C., Azuma, Y., and Nakanishi, M. (2021). HPF1-dependent PARP activation promotes LIG3-XRCC1-mediated backup pathway of Okazaki fragment ligation. Nucleic Acids Res 49, 5003-5016.

      Landry, J. J., Pyl, P. T., Rausch, T., Zichner, T., Tekkedil, M. M., Stutz, A. M., Jauch, A., Aiyar, R. S., Pau, G., Delhomme, N., et al. (2013). The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) 3, 1213-1224.

      Mortusewicz, O., Fouquerel, E., Ame, J. C., Leonhardt, H., and Schreiber, V. (2011). PARG is recruited to DNA damage sites through poly(ADP-ribose)- and PCNA-dependent mechanisms. Nucleic Acids Res 39, 5045-5056.

      Shirai, H., Fujimori, H., Gunji, A., Maeda, D., Hirai, T., Poetsch, A. R., Harada, H., Yoshida, T., Sasai, K., Okayasu, R., and Masutani, M. (2013). Parg deficiency confers radio-sensitization through enhanced cell death in mouse ES cells exposed to various forms of ionizing radiation. Biochem Biophys Res Commun 435, 100-106.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors were trying to investigate whether viral IBs are involved in antagonizing IFN-I production during EBOV trVLPs infection. They found that IRF3 is hijacked and sequestered into EBOV IBs after viral infection, thereby leading to the spatial isolation of IRF3 with TBK1 and IKKε. In such a progress, the activity of IRF3 is suppressed and downstream IFN-I induction is inhibited. The authors designed many experiments, such as the PLA that examined the colocalization, to support their conclusions. However, necessary negative controls were missed in several assays. More key index is needed to be examined in several assays.

      The paper is well organized and most data in this paper could support the conclusions, while there are several issues that need to be further solved.

      1. In Figure 2-4, authors should examine the expression of downstream IFNs as well as the phosphorylation and nuclear localization of IRF3 to further prove the suppression of IRF3 activity by infecting with trVLPs.

      Response: The inhibitory effect of trVLPs infection on the phosphorylation of IRF3 S396 and SeV-induced IRF3 nuclear localization was determined by immunoprecipitation (Figure 3D) and immunofluorescence (Figure 4A and 4B), respectively. In addition, we demonstrated that IFN-β transcription was inhibited more potently by EBOV viral inclusion bodies compared with VP35 alone (Figure 7B and 7C).

      Moreover, EBOV viral inclusion bodies were demonstrated to inhibit the transcription of IFN downstream genes (e.g., CXCL10, ISG15 and ISG56) more potently than VP35 alone (new Figure 7D-F).

      1. In Figure 5, to better prove the conclusion that EBOV NP and VP35 play an important role in sequestering IRF3 in IBS, authors should add the "NP+VP35+VP30" and "NP+VP35+VP24" groups to reperform the assay.

      Response: According to the reviewer’s suggestion, VP24 or VP30 was added to the “VP35+NP” group, and the results showed that the “NP+VP35+VP24” and “NP+VP35+VP30” groups exhibited little, if any, effect on the distribution of IRF3 compared with the “NP+VP35” group (new Figure 5 - figure supplement 2A-B).

      1. In Figure 6f, the expression of STING should be examined by immunostaining to show the knockdown efficiency in trVLPs-infected cells.

      Response: As suggested by the reviewer, immunostaining was performed to visually detect the effect of STING knockdown on the IRF3 distribution during trVLPs infection (new Figure 6F).

      Reviewer #2 (Public Review):

      The manuscript by Zhu et al explored molecular mechanisms by which Ebola virus (EBOV) evades host innate immune response. EBOV has a number of means to shut down the type I interferon induction (by viral VP35 protein) and block type I interferon action (by viral VP24 protein). This study reported a new mechanism that inclusion body (IB) used for viral replication sequesters IRF3, a key transcription factor involved in the interferon signaling, resulting in blockade of downstream type I interferon gene transcription. This finding is potentially interesting and may provide a new insight into EBOV's evasion of innate immunity. However, there are some flaws in the experimentations and analyses that need to be addressed.

      1. Most of experiments were performed by transfection of trVLP plasmids, which is very different from virus infection. The conclusions should be examined and verified in the context of virus infection.

      Response: As suggested by the reviewer, the effects of IRF3 depletion on live Ebola virus replication were examined as described in the revised manuscript. Consistent with the results obtained after trVLPs infection, IRF3 depletion exerted little, if any, effect on viral replication (new Figure 7H), which supports the notion that, upon EBOV infection and the formation of inclusion bodies, IRF3 has little, if any, transcription activation activity after sequestration by inclusion bodies.

      1. Fig 1 - VP35 displayed a classical IB staining only in Panel A, while much less so in Panel C and not in panel B. It seemed that the VP35 staining images were chosen in a way towards the authors' favor. The statistical analysis of co-localization of VP35 and IRF3, TBK1 or IKKe should be performed to draw the conclusion. Another concern is that IKKe is normally lowly expressed under a rest condition and becomes induced only when the interferon signaling is activated. It seemed to be expressed at a high level even when the interferon signaling is blocked in Panel C. The authors should comment on this discrepancy.

      Response: Ebola virus inclusion bodies show variations in both shape and size. According to the reviewer’s suggestion, the colocalization of TBK1 or IKKε and VP35 is shown in new figures (new Figure 1C and 1E), and quantitatively analyzed by the fluorescence intensity using ImageJ software (new Figure 1B, 1D and 1F).

      1. Fig 2 - Was this experiment done by transfection or infection? The description of result is not consistent with the figure legend. The labeling was also not consistent between panel A and B. I would suggest performing Western blot to analyze the expression level of IRF3.

      Response: We apologize for the incorrect description of the data. Ebola virus trVLPs were initially produced based on transfection but also involved the viral infection process. The use of “transfection” in the figure and figure legends has been changed to “infection” in the revised manuscript. As suggested by the reviewer, Western blotting was performed to analyze the IRF3 expression levels at different time points after trVLPs infection (new Figure 2D).

      1. Fig 3 and 4 - As VP35 is well known for its highly efficient blockade of type I interferon activation, how would the authors differentiate the effect of VP35 alone from the sequestration of IRF3 in IBs in these experiments?

      Response: Previous studies have found that VP35, rather than NP, inhibits the expression of interferon, and the “VP35+NP” treatment, which induces IRF3 sequestration, showed inhibited IFN-β luciferase activity much more potently than VP35 expression alone (Figure 7B).

      1. Fig 3 - PolyIC can activate both RLR and TLR signaling pathways. Can the author comment on which pathway it activates in this experiment?

      Response: In this study, the effect of poly(I:C) was consistent with the results observed with SeV, which indicated that poly(I:C) may mainly activate the RLR signaling pathway. A discussion was added to the revised manuscript.

      1. The authors demonstrated that VP35 interacts with STING and recruit the latter to IBs. How would this affect the function of STING given that STING plays essential roles in cGAS/cGAMP pathway?

      Response: This study unexpectedly showed that VP35 can recruit IRF3 into viral inclusion bodies through STING, but whether it regulates the cGAS-STING pathway remains to be further investigated. Related discussion was added to the revised manuscript.

      1. It is difficult to follow the logics of Fig 7. The expression level of each viral protein should be determined. Ideally, a mutation in VP35 that disrupts its ability to antagonize the interferon signaling but still allows for the IB formation can be used to assess the relative contribution of IB sequestering IRF3.

      Response: As suggested by the reviewer, a series of VP35 mutants were constructed, but we failed to obtain a VP35 mutant that contains a mutation that disrupts the ability of the protein to antagonize interferon signaling but still allows IB formation. Instead, coexpression of “NP+VP35+VP30+L”, which induces IBs formation, inhibited IFN-I more potently than the expression of VP35 alone (Figure 7B). IRF3 knockout inhibited poly(I:C)-induced IFN-I production but had little, if any, effect on poly(I:C)-induced IFN-I production in the “NP+VP35+VP30+L” group (Figure 7C). IRF3 knockout in the cells did not significantly affect viral replication, but overexpression of activated IRF3 (IRF3/5D), instead of wild-type IRF3, inhibited viral replication (new Figure 7G-H). These results collectively suggested that almost all IRF3 in cells was hijacked and sequestered into IBs in the Ebola virus-infected cells.

    1. Author Response

      The following is the authors’ response to the original reviews.

      RESPONSE TO REVIEWERS:

      Reviewer #1 (Recommendations For The Authors):

      I think the manuscript of this excellent work can be improved, especially in writing (including a suggestion in the title) and presentation (Figure 6); Also some additional specific experiments and analyses could be important, as I suggest below,

      1. For the title, perhaps a shorter "The acetylase activity of Cdu1 protects Chlamydia effectors from degradation" would be better to convey the major significance of this work. Of course, Cdu1 must regulate the function of InaC, IpaM and CTL0480. But perhaps it is speculative to think that egress is the major function of these effectors as their activity on other host cell processes during the cycle could eventually impact the extrusion process indirectly.

      Although we concur with the insights provided by reviewer 1, we wish to underscore that a significant breakthrough presented in our study revolves around the regulation of Chlamydia exit by Cdu1. Consequently, we believe that this noteworthy discovery should be incorporated into the title.

      1. For the writing:

      a. The description of ubiquitination and DUBs could be synthesized to the essential, so that space is gained to explain things that then come a bit out of the blue in the results (what are Incs, the specific functions of InaC, IpaM, and CTL0480 - at least place the citations in lines 110-112 next to the corresponding Incs -, Cdu2, etc - see specifics below)

      In lines 182-196 of the revised manuscript, we have incorporated additional contextual information concerning the roles of Incs, along with descriptions of the functions of InaC, IpaM, and CTL0480.

      b. In the Results, there is a lot of Chlamydia- and maybe lab-specific jargon that could be significantly simplified for the more general reader. I detail some suggestions below in the specific issues.

      We have improved the readability of our manuscript for a general audience by removing Chlamydia-specific terminology from the entire text and figures.

      1. For the figures:

      a. Figure 6, this figure could be reorganized: why two graphs in panel D? If detailed quantifications were done, perhaps in panel B just zoom on the examples of Golgi distributed/compacted? And again the labelling Rif-R L2, L2 pBOMB, M407 p2TK2, etc, simplify?

      Figure 6 has undergone restructuring. The representative images have been relocated to Supplemental Figures 5 and 6, while we have introduced sample images demonstrating F-actin assembly and Golgi repositioning. Furthermore, the quantification of Golgi dispersal has been streamlined into a single panel. Additionally, we have simplified the labeling of the strains utilized in the study.

      b. Figure 3, in the labelling, WT, inaC null, cdu1::GII wouldn't be enough? Leave the details to the legend and/or M&M.

      We have simplified the labeling of Ct strains in Figure 3.

      c. Figure 3C, these arrowheads should not be so symmetric (small arrows instead?) and it is unclear that the indicated cells do not show CTL0480.

      We have substituted arrowheads with small arrow symbols and have also revised the Figure to incorporate a new representative image that prominently illustrates the absence of CTL0480 at the inclusion membrane of some cdu1::GII inclusions within infected Hela cells at 36 hpi.

      1. Experiments:

      a. In Figure 7, at least extrusion should be analysed also with the Cdu1-deficient strain expressing Ac-deficient Cdu1 and the inaC and ipaM phenotypes should be complemented.

      We have conducted additional experiments to analyze extrusion production in Hela cells infected with a cdu1 null strain expressing the acetylase-deficient Cdu1 variant. We have incorporated the relevant data into revised Figure 7, where the impact of this strain on extrusion production and size is presented. Additionally, we updated Supplemental Figure 8 to include data illustrating the number of inclusions produced by this strain. We have also addressed these new results in the revised manuscript (lines 424-432). We are currently complementing inaC and ipaM mutant strains with various InaC and IpaM constructs that will be used in a follow up manuscript.

      b. Does overexpression of InaC, IpaM, or CTL0480 in a cdu1-null background prevent the degradation of these Incs and suppress the defects of cells infected by the cdu1 mutant (F-actin, Golgi, MYPT1)? This would show that the multiple phenotypes displayed by cells infected by the cdu1 null mutant are indeed related to the decreased levels of InaC, IpaM and CTL0480.

      We opted not to include data from the overexpression of these effectors in a cdu1-null background due to an unexpected decrease in shuttle plasmid load during overexpression. This development prompted concerns regarding the potential detrimental effects of overexpressing these effectors in the absence of Cdu1. Data supporting this observation are not included in this report.

      c. Figures 3A and 3B should be quantified (it says it is from 3 independent experiments). It would be important to have a relative perspective of how much Cdu1 protects these Incs over time (for InaC, it would also be nice to have the 36 and 48 hpi time-point). This is in contrast with the microscopy data in Figure 5, which illustrates very clear effects, and the quantification is a bit redundant.

      In Figure 3, we have incorporated a new Western Blot image showing endogenous InaC protein levels in Hela cells following infection with both WT Ct and cdu1::GII strains at 24, 36, and 48 hours post-infection (hpi). Additionally, we have quantified the Western Blot signals for both InaC and IpaM, and these results are also presented in Figure 3. The quantification of MYPT1 recruitment has been relocated to a supplementary figure. We have also included details regarding the methodology employed for the quantification of Western Blot signals in the Materials and Methods section.

      d. What is the subcellular localization of InaC, IpaM, CTL0480 and Cdu1 when analysed by transfection? Does Cdu1 bind to of InaC, IpaM, CTL0480 in infected cells? If this was attempted and unsuccessful it should be mentioned.

      In transfected HEK cells, InaC, IpaM, CTL0480, and Cdu1 all exhibit cytoplasmic localization with a diffuse pattern (data not shown). Despite our efforts, we encountered challenges in observing co-immunoprecipitation of Cdu1 with all three Incs in infected Hela cells at 24 hpi, We have duly acknowledged this limitation in our findings, as reflected in line 221-226 of the revised manuscript.

      1. Specific issues:

      2. Line 87, "propagule" is really needed to describe the EB?

      The EB is the infectious form of Chlamydia species that spreads within the host to renew its life cycle; thus, "propagule" is a suitable term to characterize the EB.

      • Exocytosis implies fusion with the plasma membrane so "inclusion is exocytosed" (line 91) is not entirely correct.

      In line 91 of the revised manuscript, we referred to extrusion as the exit of an intact inclusion from the host cell and omitted the use of "exocytosed" to describe this process.

      • Line 126, "a Ct L2 (LGV L2 434 Bu) background". Maybe "a Ct cdu1-null strain" would be enough and leave the detail for Materials and Methods.

      In line 128 of the revised manuscript, we omitted "(LGV L2 434 Bu)" to avoid using jargon that may be unfamiliar to readers not well-versed in Chlamydia terminology.

      • Line 138, in the previous Pruneda et al, Nature Microbiol 2018, the title of figure 4 is "ChlaDUB deubiquitinase activity is required for C. trachomatis Golgi fragmentation", so why raise this hypothesis? And why in the end is the acetylation activity of Cdu1 that promotes Golgi distribution? I think this related with infection vs transfection experiments but it deserved to be briefly explained/discussed.

      In lines 140-142 of the revised manuscript, we provide clarification that the DUB activity of Cdu1 is required for Golgi fragmentation in transfected cells. This observation supports our initial hypothesis suggesting that the DUB activity of Cdu1 is also required for Golgi distribution in infected cells, and our rationale for identifying targets of its DUB activity.

      • Lines 147-155, what is the relevance of this non-ubiquitinated proteins that come along? Couldn't this be synthesized?

      We have included a discussion on non-ubiquitinated proteins, as they could potentially encompass proteins that interact with those protected by Cdu1. This perspective provides supplementary insights into the roles of proteins targeted for ubiquitination in the absence of Cdu1. The results of this analysis have been succinctly summarized in a single paragraph within the initial manuscript (lines 151-159 of the revised manuscript).

      • Line 170, I think it is the first time that "Type 3 secretion"; perhaps explain in the introduction.

      Type 3 secretion systems have been extensively characterized and discussed in the literature, and we anticipate that the majority of our readers are well-acquainted with this secretory mechanism.

      • Line 184, I think it is the first time "microdomains" are mentioned; perhaps mention in the introduction.

      The definition of "microdomains" has been provided in line 191 of the revised manuscript.

      • Figure 2, as it stands the analysis with truncated Cdu1 proteins adds little to the work. Binding to the Incs seems to be affected when the TM domain is not present, but it still binds. And this is in a transfection context.

      The results depicted in Figure 2, involving truncated Cdu1 proteins, illustrates that Cdu1 is capable of interacting with InaC, IpaM, and CTL0480 even in the absence of infection. This finding serves as evidence suggesting that all three Incs could potentially serve as direct targets for Cdu1 activity. As a result, we prefer to keep these findings in the manuscript.

      • Line 219, "late stages of infection", this is shown (albeit not completely quantified) for IpaM and CTL0480, but not for InaC.

      In the revised Figure 3, we show InaC protein levels at 24, 36, and 48 hours post-infection, and we have incorporated quantitative data for both InaC and IpaM protein levels in the context of Hela cells infected with both WT L2 and cdu1::GII strains. This updated figure serves to emphasize the pivotal role of Cdu1 in safeguarding all three Incs during the late stages of infection.

      • Line 233, "pBOMB-MCI backbone" - is this needed in the Results section? And this refers to Figure 4 while pBOMB appear already in Fig. 3.

      We have removed “pBOMB-MCI backbone” in the revised manuscript.

      • Line 236, should be cdu1 endogenous promoter.

      In line 265 of the revised manuscript we have replaced Cdu1 with cdu1 (italicized).

      • Line 263, WT.

      In line 293 of the revised manuscript we replaced “wild type” with “WT”.

      • Line 277, IncA instead of "the Inc protein IncA".

      In the manuscript we wanted to emphasize that IncA is also an inclusion membrane protein, therefore we have included “the Inc protein IncA” in the revised manuscript to avoid any confusion.

      • How does the data in Figure 5 relates to the relatively few proteins ubiquitinated in cells infected with cdu1-mutant Ct? These Ub-labelling corresponds to ubiquitinated InaC, IpaM and CTL0480?

      The findings presented in Figure 5 demonstrate that the acetylase activity of Cdu1 plays a crucial role in enabling Ct to block all ubiquitination events taking place on or in proximity to the periphery of the inclusion membrane. This encompasses Cdu1 targets that might not have been identified through our proteomic analysis.

      • Lines 299-301, "M923 inclusions", there is certainly a clear way to write this.

      In lines 326-327 and 332-332 of the revised manuscript, we have clarified that “M923” is an incA null strain to provide clarification.

      • Line 309, is "peripheries" correct?

      We have changed “peripheries” with “periphery” in the revised manuscript (line 360).

      • Line 312, "Rif-R L2" and "M407" - can this be simplified?

      In the revised manuscript, "Rif-R L2" was substituted with "WT L2" in lines 363 and 382, while "M407" was exchanged with "an inaC null strain" in lines 311, 367, and 368. These same replacements were applied to the Figures and their corresponding legends for consistency.

      • Lines 308-321, and 326-335, these % are all approximate figures and this should be made clear.

      In lines 364-395 of the revised manuscript we have stated that all percentages are approximate values.

      • Fig. S1, kb and not k.b; what's the "+ control"; and is not really possible to have a PCR that works for the *? 3 kb is not that long.

      In the updated Figure S1, we have corrected "k.b" to "kb". In the legend of Figure S1, we have clarified that the + control corresponds to the cdu2 locus. Moreover, we could not cleanly amplify a 3 kb PCR product from bacteria in whole cell lysates of infected mammalian cells (Vero cells).

      • Fig. S2, kb and not k.b, bp and not b.p

      In the updated Figure S2, we have corrected “k.b” with “kb” and “b.p” with “bp”.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1 describes an affinity-based purification and mass spectrometric identification of differentially ubiquitinated proteins (host and chlamydial). Through different permutations of combinations of infection (mock, wild type, and Cdu1 mutant), three effectors, IpaM, InaC, and CTL0480, were identified as putative targets of Cdu1. The authors used a high-stringency cutoff, which could explain identification of only three targets. Having said this, the localization of Cdu1 to the inclusion membrane would be expected to also narrow down the number of targets. Interestingly, Cdu2, another deubiquitinase remained active in these experiments, which could have affected identification of Cdu1 targets. The authors addressed this issue by referring to previously reported structural studies. A somewhat glaring omission is the lack of reference to NF-kB as a substrate of ChlaDub1/Cdu1. In experiments by Le Negrate et al., ChlaDub1 ectopic overexpression in cells led to the deubiquitination of IkB-alpha, thus inhibiting the nuclear translation of NF-kB. Based on the inclusion membrane localization of Cdu1 during infection, is the identification of IkB an artifact of overexpression of Cdu1, or is it still a bona fide Cdu1 target?

      We conducted experiments using our cdu1 null strain to investigate whether IκBα could be a target of Cdu1 activity. While our findings are intriguing and relevant, it is not feasible to determine, at this stage, whether our findings result from a direct or indirect consequence of Cdu1 localizing to the inclusion membrane. Consequently, these findings extend beyond the scope of the current manuscript. We plan to explore the implications of our observations more deeply in a subsequent manuscript, where we intend to provide a more comprehensive and mechanistic analysis based on these preliminary findings. Additionally, we have referenced the potential targeting of IκBα by Cdu1 in lines 100-101 and 166-171 of the revised manuscript.

      Figure 2 demonstrates the individual interaction of the identified effectors with Cdu1. Interaction at the inclusion membrane is inferred from colocalization studies, while protein-protein interaction is monitored using ectopic overexpression of tagged versions of Cdu1 and the individual effectors. This is somewhat of a weakness of the manuscript because the mechanism of action of Cdu1 towards its target hinges on protein-protein interaction.

      Despite our efforts, we encountered challenges in co-immunoprecipitating endogenous Cdu1 with all three Incs in infected Hela cells at 24 hpi. There are multiple technical reasons as to why these interactions, which are predicted to be transient, will not be captured by bulk affinity approaches such as immunoprecipitations, especially when the starting materials are present in very low abundance. We acknowledged these limitations in our findings, as reflected in lines 221-226 of the revised manuscript.

      Figure 3 provides the first evidence in this paper of the importance of the inferred interaction of Cdu1 with the three effectors. The authors show that the loss of cdu1 has stability consequences on the three effectors. This figure would benefit from quantifying InaC- or IpaM-positive inclusions in the same manner done with CTL0480. The timepoint-dependent effect of Cdu1 loss of function is intriguing. Do InaC and IpaM retention at the inclusion show the same timepoint-dependent characteristic?

      In the revised Figure 3, we have incorporated InaC protein levels at 24, 36, and 48 hours post-infection. Additionally, we have included quantitative data representing both InaC and IpaM protein levels in HeLa cells infected with both WT L2 and cdu1::GII strains. The quantification of CTL0480 localization to cdu1::GII inclusions has been moved to a supplementary figure.

      This updated figure illustrates that the absence of Cdu1 has a time-dependent impact on both InaC and IpaM. However, it is noteworthy that the kinetics of degradation for these two proteins diverge significantly.

      For Figure 7, the authors should consider monitoring timing of inclusion extrusion to gain additional insight into the functional interactions between the effectors. For example, the loss of CTL0480 leads to increased extrusion, implying a role in delaying or suppressing extrusion. In a time-course experiment, a CTL0480 mutant could exhibit an earlier occurrence of inclusion extrusion.

      One of the principal discoveries of this study is that Cdu1, InaC, IpaM, and CTL0480 collaborate to facilitate optimal extrusion of Ct from host cells. These findings represent a significant contribution to our understanding of how Chlamydia controls its exit from infected cells. We are currently in the process of expanding on these results. A forthcoming follow-up manuscript will provide more detailed and comprehensive exploration of these findings.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments.

      a. I have some concerns related to the time point chosen for mass spec analysis and potential caveats and alternative interpretations. This work was done relatively early (24 hours) compared to the most convincing Cdu1 functions that occur later, thus this may limit the authors global understanding of protein changes. For example, the known substrate of Cdu1, Mcl-1 was not identified but this is altered relatively late during infection. Thus, the surprise that minimal host proteins are altered in ubiquitination may be partially driven by the timing of the assay. This should be more clearly discussed as a caveat.

      In the revised manuscript (lines 166-171), we have acknowledged that there might be additional targets of Cdu1 that remain unidentified, primarily due to the specific time point we utilized in our study.

      b. Another caveat to these studies is while the loss of Cdu1 alters different effectors stability and function and extrusion size, these changes do not modulate bacterial growth in cells. The authors speculate that regulating extrusion size may alter interactions with innate cells to drive dissemination. However, a previous study found defects in an animal model using a Cdu1 transposon mutant found decreased bacterial load in the genital tract. It is also possible that redundancy of effectors may mask importance in growth of Cdu1, but the authors strongly argue against redundancy of Cdu1 and Cdu2 so this weakens the authors argument here. These concepts and published data should be more directly discussed in the context of the authors proposed extrusion model and the role in driving Chlamydia growth and pathogenesis.

      In our revised manuscript (lines 460-466) we propose that while we do not observe any growth impairments during Ct growth in the absence of Cdu1 in HeLa cells, the reduction in bacterial loads observed in murine models of infection with an independent cdu1 mutant strain (cdu1::Tn) may potentially be linked to defects in extrusion production or alterations in Cdu1-dependent regulation of extrusion size.

      c. Recent studies have found that IFNg activation can result in dramatic changes in ubiquitination to pathogen containing vacuoles. While some of these are blocked by the newly found GarD, it seems possible that Cdu1 may also play a role (and perhaps use its deubiquinating activity) to further protect the inclusion. In light of published results showing that Cdu1 mutants have lower IFU burst size only in IFNg activated cells, this may be an important caveat in the current studies. This should be more directly addressed in the current manuscript.

      We have incorporated two experimental findings indicating that the presence of Cdu1 is not required for Ct to defend itself against IFN cellular immunity in human cells. These recent discoveries are now presented in the updated Figure 5 and detailed in lines 338-355 of the revised manuscript.

      d. On lines 433-434 the authors claim that Cdu1 is atypical since it is not encoded with the metaeffector/target pairs. However, this is an oversimplification of what is known about metaeffectors. For example, there are meta-effector/effector pairs that are not encoded together in Legionella (see table 1 DOI: https://doi.org/10.3390/pathogens10020108). Thus, the discussion should be adjusted. It seems Cdu1 is the first meta-effector found in Chlamydia, and maybe this should be highlighted more strongly rather than its uniqueness in this aspect of meta-effector/effector functions.

      In lines 488-489 of the revised manuscript, we have removed the assertion that Cdu1 functions as an atypical metaeffector and emphasized that it represents the initial discovery of a metaeffector within Ct.

    1. Author Response

      eLife assessment

      This important work describes the first high-resolution structure of HGSNAT, a lysosomal membrane protein required for the degradation of heparan sulfate (HS). Through careful structural analysis, this work proposes potential reasons why certain mutations in HGSNAT lead to lysosomal storage disorders and outlines the enzyme's catalytic mechanism. The experimental evidence presented provides incomplete support for the proposed molecular mechanism of the HS acetylation reaction and the impact of disease-causing mutations.

      We thank the editors and reviewers for taking the time to provide a critical assessment of our manuscript. We appreciate the input and suggestions to improve the analysis. Included here are only our provisional responses. We will address the concerns raised in more detail and incorporate them in the revised version of the manuscript.

      Reviewer #1 (Public Review):

      This article by Navratna et al. reports the first structure of human HGSNAT in an acetyl-CoAbound state. Through careful structural analysis, the authors propose potential reasons why certain human mutations lead to lysosomal storage disorders and outline a catalytic mechanism. The structural data are of good quality, and the manuscript is clearly written. This study represents an important step toward understanding the mechanism of HGSNAT and is valuable to the field. I have the following suggestions:

      We thank the reviewer for their encouraging and positive overall assessment of our work.

      1. The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function.

      Thank you for highlighting this concern. The cryo-EM sample was prepared without the exogenous addition of ligand, as noted in the manuscript; the acetyl-CoA that we see in the structure was intrinsically bound to the protein, indicating the ability of GFP-tagged HGSNAT protein to bind the ligand. We purified the protein at a pH optimal for acetyl-CoA binding, as suggested by Bame, K. J. and Rome, L. H. (1985) and Meikle, P. J. et al., (1995). Because we see acetyl-CoA in a structure obtained using a GFP fusion, we argue that GFP does not interfere with protein stability and ability to bind to the co-substrate. As demonstrated by existing literature HGSNAT catalyzed reaction is compartmentalized spatially and conditionally. The binding of acetyl-CoA happens towards the cytosol and is optimal at pH 7-0.8.0, while the transfer of the acetyl group to heparan sulfate occurs towards the luminal side and is optimal at pH 5.0-6.0. We are working on establishing a robust assay to study this complicated and compartmentalized acetyl transfer assay.

      1. In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, like a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer?

      The acetyl-CoA bound structure presented in the paper does not conclusively support a potential for isomerization and conformational dynamics. We agree with the reviewer that the reaction schematic presented in Figure 5 is speculative. We acknowledge in the discussion that our structure represents only a single step of the reaction, and defining the precise mechanism of acetyl transfer needs additional work. However, we will reword the discussion and change Figure 5 to address this concern raised by multiple reviewers.

      Reviewer #2 (Public Review):

      Summary:

      This work describes the structure of Heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT), a lysosomal membrane protein that catalyzes the acetylation reaction of the terminal alpha-D-glucosamine group required for the degradation of heparan sulfate (HS). HS degradation takes place during the degradation of the extracellular matrix, a process required for restructuring tissue architecture, regulation of cellular function, and differentiation. During this process, HS is degraded into monosaccharides and free sulfate in lysosomes.

      HGSNAT catalyzes the transfer of the acetyl group from acetyl-CoA to the terminal non-reducing amino group of alpha-D-glucosamine. The molecular mechanism by which this process occurs has not been described so far. One of the main reasons to study the mechanism of HGSNAT is that multiple mutations spanning the entire sequence of the protein, such as nonsense mutations, splicesite variants, and missense mutations lead to dysfunction that causes abnormal accumulation of HS within the lysosomes. This accumulation is a cause of mucopolysaccharidosis IIIC (MPS IIIC), an autosomal recessive neurodegenerative lysosomal storage disorder, for which there are no approved drugs or treatment strategies.

      This paper provides a 3.26A structure of HGSNAT, determined by single-particle cryo-EM. The structure reveals that HGSNAT is a dimer in detergent micelles and a density assigned to acetylCoA. The authors speculate about the molecular mechanism of the acetylation reaction, map the mutations known to cause MPS IIIC on the structure and speculate about the nature of the HGSNAT disfunction caused by such mutations.

      Strengths:

      The description of the architecture of HGSNAT is the highlight of the paper since this corresponds to the first description of the structure of a member of the transmembrane acyl transferase (TmAT) superfamily. The high resolution of an HGSNAT bound to acetyl-CoA is an important leap in our understanding of the HGSNAT mechanism. The density map is of high quality, except for the luminal domain. The location of the acetyl-CoA allows speculation about the mechanistic role of multiple residues surrounding this molecule. The authors thoroughly describe the architecture of HGSNAT and map the mutations leading to MPS IIIC. The description of the dimeric interphase is a novel result, and future studies are left to confirm the importance of oligomerization for function.

      We thank the reviewer for their time and for highlighting both the quality and novelty of the structure presented in this work.

      Weaknesses:

      Apart from the cryo-EM structure, the article does not provide any other experimental evidence to support or explain a molecular mechanism. Due to the complete absence of functional assays, mutagenesis analysis, or other structures such as a ternary complex or an acetylated enzyme intermediate, the mechanistic model depicted in Figure 5 should be taken with caution.

      Thank you for pointing out this concern. The proposed mechanistic model in Figure 5 is a hypothesis based on previously reported biochemical characterization of HGSNAT by Rome & Crain (1981), Rome et al, (1983), Miekle et al., (1995) and Fan et al., (2011). However, we agree with the reviewer that this schematic is not experimentally proven and is speculative at best. Especially because our structure presents only a single step of the reaction, which does not conclusively support either ping-pong or random-order bi-substrate reactions. We will rephrase this section of our discussion and edit Figure 5 to address this concern.

      The authors discuss that H269 is an essential residue that participates in the acetylation reaction, possibly becoming acetylated during the process. However, there is no solid experimental evidence, e.g. mutagenesis analysis or structural analysis, in this or previous articles, that demonstrates this to be the case.

      H269, as a crucial catalytic residue, was suggested by monitoring the effect of chemical modifications of amino acids on acetylation of HGSNAT membranes by Bame, K. J. and Rome, L. H. (1986). We agree that mutagenesis, catalysis, and structural evidence for the same are not currently available. We are pursuing a more thorough exploration of the role of both H269 (previous studies) and N258 (from this study) on the stability and function of HGSNAT.

      In the discussion part, the authors mention previous studies in which it was postulated that the catalytic reaction can be described by a random order mechanistic model or a Ping Pong Bi Bi model. However, the authors leave open the question of which of these mechanisms best describes the acetylation reaction. The structure presented here does not provide evidence that could support one mechanism or the other.

      We agree with the reviewer’s observation that the structure doesn’t indeed support one reaction mechanism or another. We are pursuing the structural and kinetic characterization of HGSNAT in the presence of other co-substrates and multiple pHs that are required to address this concern thoroughly.

      Although the authors map the mutations leading to MPS IIIC on the structure and use FoldX software to predict the impact of these mutations on folding and fold stability, there is no experimental evidence to support FoldX's predictions.

      We are working on assessing the impact of specific mutations on the stability of HGSNAT and will add them to the revised version of the manuscript. We thank the reviewer for this suggestion.

      Reviewer #3 (Public Review):

      Summary:

      Navratna et al. have solved the first structure of a transmembrane N-acetyltransferase (TNAT), resolving the architecture of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state using single particle cryo-electron microscopy (cryoEM). They show that the protein is a dimer and define the architecture of the alpha- and beta- GSNAT fragments, as well as convincingly characterizing the binding site of acetyl-CoA.

      Strengths:

      This is the first structure of any member of the transmembrane acyl transferase superfamily, and as such it provides important insights into the architecture and acetyl-CoA binding site of this class of enzymes.

      The structural data is of a high quality, with an isotropic cryoEM density map at 3.3Å facilitating the building of a high-confidence atomic model. Importantly, the density of the acetyl-CoA ligand is particularly well-defined, as are the contacting residues within the transmembrane domain.

      The open-to-lumen structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional characterization of the reaction cycle of this class of enzymes.

      We thank the reviewer for their positive assessment of the data presented in this work. We really appreciate and agree with the reviewer's comment that the “structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional studies.”

      Weaknesses:

      While the structural data for the open-to-lumen state presented in this work is very convincing, and clearly defines the binding site of acetyl-CoA, to get a complete picture of the enzymatic mechanism of this family, additional structures of other states will be required.

      We agree with the reviewers’ assessment and are heavily invested in pursuing the structures of all the steps of acetyl transfer by HGSNAT.

      A potentially significant weakness of the study is the lack of functional validation. The enzymatic activity of the enzyme characterized was not measured, and the enzyme lacks native proteolytic processing, so it is a little unclear whether the structure represents an active enzyme.

      We thank the reviewer for this comment. While the proteolytic cleavage of the protein remains debated, we find no evidence of such an event in our purification (SDS-PAGE and SEC). Studies like Durand et al., (2010) and Fan et al., (2011) suggest that even the ER retained monomeric HGSNAT is active. Because we see acetyl-CoA (co-substrate) bound to the protein in our structure, we surmise that proteolysis is not necessary for function, at least not for substrate binding. However, we are working towards the structural and kinetic characterization of recombinant α- and β-HGSNAT construct to explore the role of proteolysis on HGSNAT stability and function.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper examines the Bithorax complex in several butterfly species, in which the complex is contiguous and not split, as it is in the well-studied fruit fly Drosophila. Based on genetic screens and genetic manipulations of a boundary element involved in segment-specific regulation of Ubx, the authors provide solid evidence for their conclusions, which could be further strengthened by additional data and analyses. The data presented are relevant for those interested in the evolution and function of Hox genes and of gene regulation in general.

      We are deeply grateful to the eLife editorial team and the two reviewers for their thoughtful and constructive feedback. We have used this feedback to improve our manuscript and have provided a point-by-point response below.

      Public Reviews:

      Reviewer #1 (Public Review):

      In their article, "Cis-regulatory modes of Ultrabithorax inactivation in butterfly forewings," Tendolkar and colleagues explore Ubx regulation in butterflies. The authors investigated how Ubx expression is restricted to the hindwing in butterflies through a series of genomic analyses and genetic perturbations. The authors provide evidence that a Topologically Associated Domain (TAD) maintains a hindwing-enriched profile of chromatin around Ubx, largely through an apparent boundary element. CRISPR mutations of this boundary element led to ectopic Ubx expression in forewings, resulting in homeotic transformation in the wings. The authors also explore the results of the mutation in two non-coding RNA regions as well as a possible enhancer module. Each of these induces homeotic phenotypes. Finally, the authors describe a number of homeotic phenotypes in butterflies, which they relate to their work.

      Together, this was an interesting paper with compelling initial data. That said, I have several items that I feel would warrant further discussion, presentation, or data.

      First, I would not state, "Little is known about how Hox genes are regulated outside of flies." They should add "in insects" since so much in known in vertebrates

      Corrected

      For Figure 1, it would aid the readers if the authors could show the number of RNAseq reads across the locus. This would allow the readership to evaluate the frequency of the lncRNAs, splice variants, etc.

      We have found it useful in the past to feature “Sashimi Plots”, as they provide a good overview of transcript splicing junctions and read support. Here we could not accommodate this in our Fig. 1A as this would require compiling the RNAseq reads from many tissues and stages to be meaningful, and we would lose the resolution on forewing vs hindwing tissues that is important in this article (only the Kallima inachus dataset allows this comparison, and was used in Fig 1B). More specifically, the wing transcriptomes available for J. coenia and V. cardui are not deep enough to provide a good visualization of Antp alternative promoter usage or on AS5’ transcription.

      How common are boundary elements within introns? Typically, boundary elements are outside gene bodies, so this could be explored further. This seems like an interesting bit of biology which, following from the above point, it would be interesting to, at a minimum, discuss, but also relate to how transcription occurs through a possible boundary element (are there splice variants, for example?).

      We do not see evidence of alternative splicing, and prefer to avoid speculating on transcriptional effects, but we agree that the intragenicity of the TAD boundary is interesting. We briefly highlighted this point in the revised Discussion:

      "Lastly, it is worth noting that the Antp/Ubx TAD boundary we identified is intragenic, within the last intron of Ubx. It is unclear if this feature affects Ubx transcription, but this configuration might be analogue to the Notch locus in Drosophila, which includes a functional TAD boundary in an intronic position (Arzate-Mejía et al. 2020)."

      The CRISPR experiments led to compelling phenotypes. However, as a Drosophila biologist, I found it hard to interpret the data from mosaic experiments. For example, in control experiments, how often do butterflies die? Are there offsite effects? It's striking that single-guide RNAs led to such strong effects. Is this common outside of this system? Is it possible to explore the function effects at the boundary element - are these generating large deletions (for example, like Mazo-Vargas et al., 2022)? For the mosaic experiments, how frequent are these effects in nature or captive stocks? Would it be possible to resequence these types of effects? At the moment, this data, while compelling, was hard to put into the context of the experiments above without understanding how common the effects are. Ideally, there would be resequencing of these tissues, which could be targeted, but it was not clear to me the general rates of these variants.

      We agree with this assessment completely: mosaics complicate the proper interpretation of CRISPR based perturbation assays in regulatory regions. Here, unlike in Mazo-Vargas et al. (2022), we were unable to breed homeotic effects to a G1 generation, possibly because the phenotypes are dominant and lethal at the embryonic stage (see also our reply to Reviewer 2). This means that mosaic mutants are often survivors with clones of restricted size in the wing, and they are probably rare, but we are unable to meaningfully measure a mutation spectrum frequency (e.g. how often large deletions are generated). As mentioned in the first paragraph of our Discussion, we think that many of the phenotypes we observed (besides the Ubx GOF effects from the BE targeting) were confounded by alleles that could include large SVs. We aim to address these questions in an upcoming manuscript, at a locus where regulatory perturbation does not impact survival, including using germline mutants and unbiased genotyping (whole genome resequencing).

      We elaborated on this issue in our Discussion:

      "It is crucial here to highlight the limitations of the method, in order to derive proper insights about the functionality of the regulatory regions we tested. In essence, butterfly CRISPR experiments generate random mutations by non-homologous end joining repair, that are usually deletions (Connahs et al. 2019; Mazo-Vargas et al. 2022; Van Belleghem et al. 2023). Ideally, regulatory CRISPR-induced alleles require genotyping in a second (G1) generation to be properly matched to a phenotype (Mazo-Vargas et al. 2022). Possibly because of lethal effects, we failed to pass G0 mutations to a G1 generation for genotyping, and were thus limited here to mosaic analysis. As adult wings have lost scale building cells that may underlie a given phenotype, we circumvented this issue by genotyping a pupal forewing displaying an homeotic phenotype in the more efficient Antp-Ubx_BE perturbation experiment (Fig. S4). In this case, PCR amplification of a 600 bp fragment followed by Sanger sequencing recovered signatures of indel variants, with mixed chromatograms starting at the targeted sites. But in all other experiments (CRM11, IT1, and AS5’ targets), we did not genotype mutant tissues, as they were only detected in adult stages and generally with small clone sizes. Some of these clones may have been the results of large structural variants, as data from other organisms suggests that Cas9 nuclease targeting can generate larger than expected mutations that evade common genotyping techniques (Shin et al. 2017; Adikusuma et al. 2018; Kosicki et al. 2018; Cullot et al. 2019; Owens et al. 2019). Even under the assumption that such mutations are relatively rare in butterfly embryos, the fact we injected >100 embryos in each experiment makes their occurrence likely (Fig. 9), and we are unable to assign a specific genotype to the homeotic effects we obtained in CRM11, IT1 and AS5’ perturbation assays."

      Our revision also includes a new Fig. S4 that features the mosaic genotyping of a G0 Antp-Ubx_BE mutant tissue. While this does not fully address the reviewer questions, it provides reasonable validation that the frequent GOF effects we observed upon perturbation at this target site are generated by on-target indels from DNA repair.

      Author response image 1.

      Validation of CRISPR-induced DNA Lesions in an Antp-Ubx_BE crispant pupat forewing. (A-A') Pupal forewing cuticle phenotype of an Antp-Ubx_BE J. coenia crispant, as in Fig. S3. (B-B") Aspect of the same forewing under trans-illumination following dissection out of the pupal case. Regions from mutant clones have a more transparent appearance. (C). Sanger sequencing of an amplicon targeting the Antp-Ubx_BE region in the mutant tissue shown in panel B", compared to a control wing tissue, showing mixed chromatogram around the expected CRISPR cutting site due to indel mutations from non-homologous end-joining.

      In sum, I enjoyed the extensive mosaic perturbations. However, I feel that more molecular descriptions would elevate the work and make a larger impact on the field.

      Reviewer #2 (Public Review):

      Summary:

      The existence of hox gene complexes conserved in animals with bilateral symmetry and in which the genes are arranged along the chromosome in the same order as the structures they specify along the anteroposterior axis of organisms is one of the most spectacular discoveries of recent developmental biology. In brief, homeotic mutations lead to the transformation of a given body segment of the fly into a copy of the next adjacent segment. For the sake of understanding the main observation of this work, it is important to know that in loss-of-function (LOF) alleles, a given segment develops like a copy of the segment immediately anterior to it, and in gain-of-function mutations (GOF), the affected segment develops like a copy of the immediately posterior segment. Over the last 30 years the molecular lesions associated with GOF alleles led to a model where the sequential activation of the hox genes along the chromosome result from the sequential opening of chromosomal domains. Most of these GOF alleles turned out to be deletions of boundary elements (BE) that define the extent of the segment-specific regulatory domains. The fruit fly Drosophila is a highly specialized insect with a very rapid mode of segmentation. Furthermore, the hox clusters in this lineage have split. Given these specificities it is legitimate to question whether the regulatory landscape of the BX-C we know of in D.melanogaster is the result of very high specialization in this lineage, or whether it reflects a more ancestral organization. In this article, the authors address this question by analyzing the continuous hox cluster in butterflies. They focus on the intergenic region between the Antennapedia and the Ubx gene, where the split occurred in D.melanogaster. Hi-C and ATAC-seq data suggest the existence of a boundary element between 2 Topologically-Associated-Domain (TAD) which is also characterized by the presence of CTCF binding sites. Butterflies have 2 pairs of wings originating from T2 (forewing) specified by Antp and T3 specified by Ubx (hindwing). Remarkably, CRISPR mutational perturbation of this boundary leads to the hatching of butterflies with homeotic clones of cells with hindwings identities in the forewing (a posteriorly oriented homeotic transformation). In agreement with this phenotype, the authors observe ectopic expression of Ubx in these clones of cells. In other words, CRISPR mutagenesis of this BE region identified by molecular tool give rise to homeotic transformations directed towards more posterior segment as the boundary mutations that had been 1st identified on the basis of their posterior oriented homeotic transformation in Drosophila. None of the mutant clones they observed affect the hindwing, indicating that their scheme did not affect the nearby Ubx transcription unit. This is reassuring and important first evidence that some of the regulatory paradigms that have been proposed in fruit flies are also at work in the common ancestor to Drosophilae and Lepidoptera.

      Given the large size of the Ubx transcription unit and its associated regulatory regions it is not surprising that the authors have identified ncRNA that are conserved in 4 species of Nymphalinae butterflies, some of which also present in D.melanogaster. Attempts to target the promoters by CRISPR give rise to clones of cells in both forewings and hindwings, suggesting the generation of regulatory mutations associated with both LOF and GOF transformations. The presence of clones with dual homeosis suggests the targeting of Ubx activator and repression CRMs. Unfortunately, these experiments do not allow us to make further conclusions on the role of these ncRNA or in the identification of specific regulatory elements. To the opinion of this reviewer, some recent papers addressing the role that these ncRNA may play in boundary function should be taken with caution, and evidence that ncRNA(s) regulate boundaries in the BX-C in a WT context is still lacking.

      Strengths:

      The convincing GOF phenotype resulting from the targeting of the Antp-Ubx_BE.

      Weaknesses:

      The lack of comparisons with the equivalent phenotypes obtained in D.melanogaster with for example the Fub mutation.

      We are grateful for this excellent contextualization of our findings and have incorporated some of the historical elements into our revision, as detailed below.

      Reviewer #2 (Recommendations For The Authors):

      In the whole paper, the authors bring the notion of boundaries through the angle of the existence of TADs and ignore almost entirely to explain the characteristics of boundary mutation in the BX-C. To my knowledge examples where targeted boundary deletions between TADs result in misregulation of the neighboring genes, and/or a phenotype, are extremely sparse (especially in the context of the mouse hox genes). Given the extensive litterature describing the boundary mutations and their associated GOF phenotypes, the paper would certainly gain strength if the authors justify their approach through this wealth of information. I must admit that this referee is surprised by the absence of any references to the founding work of the Karch and Bender laboratories on this topic. As a matter of fact, one of the founding members of the boundary class of regulatory elements was already brought in 1993 with the Fab-7 and Mcp elements of the BX-C. Based on gain-of-function homeotic phenotypes, additional Fab boundaries were added to the list. Finally, in 2013, Bender and Lucas (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606092/) identified the Fub boundary element that delimits the Ubx and abd-A domains in the BX-C. Fub fulfills the criterium of lying at the border of 2 neighboring TADs. Significantly, a deletion of Fub leads to a very penetrant and strong homeotic gain-of-function phenotype in which the flies hatch with a 1st abdominal segment transformed into the 2nd. In agreement with this, abd-A is expressed one parasegment too anterior in embryos. This is exactly the observation gathered from the targeted mutations in the Antp-Ubx_BE; a dominant transformation of anterior to posterior wing accompanied by an ectopic expression of Ubx in the forming primordia of the forwing where it is normally silenced. I believe the paper would gain credibility if the results were reported with the knowledge of the similarities with Fub.

      Line 53, I am not aware of the existence of TADs for each of the 9 regulatory domains. The insulators delimit the extent of the regulatory domains but certainly not of TADs.

      We thank the reviewer for these suggestions, as well as for the correction – we agree our previous text suggested that all BX-C boundaries are TAD boundaries, which was incorrect. We added a new introduction paragraph that combines classic literature on GOF mutations at boundary elements with recent evidence these are TAD insulators, including Fub (as suggested), and adding Fab-7 for breadth of scope.

      "For instance, the deletion of a small region situated between Ubx and abd-A produces the Front-ultraabdominal phenotype (Fub) where the first abdominal segment (A1) is transformed into a copy of the second abdominal segment A2, due to a gain-of-expression of abd-A in A1 where it is normally repressed (Bender and Lucas 2013). At the molecular level, the Fub boundary is enforced by insulating factors that separate Topologically Associating Domains (TADs) of open-chromatin, while also allowing interactions of Ubx and abd-A enhancers with their target promoters (Postika et al. 2018; Srinivasan and Mishra 2020). Likewise, the Fab-7 deletion, which removes a TAD boundary insulating abd-A and Abd–B (Moniot-Perron et al. 2023), transforms parasegment 11 into parasegment 12 due to an anterior gain-of-expression of Abd-B (Gyurkovics et al. 1990). By extrapolation, one may expect that if the Drosophila Hox locus was not dislocated into two complexes, Antp and Ubx 3D contact domains would be separated by a Boundary Element (BE), and that deletions similar with Fub and Fab-7 mutations would result in gain-of-function mutations of Ubx that could effectively transform T2 regions into T3 identities."

      A reference to the 1978 Nature article of Lewis should be added after line 42 of introduction.

      Added

      Line 56-57; the BX-C encoded miRNAs are known to regulate Ubx and abd-A, but not Abd-B.

      Corrected

      From lines 57 to 61, the authors mention reports aimed at demonstrating a role of ncRNA into Ubx regulation. To my eyes, these gathered evidences are rather weak. A reference to the work of Pease et al in Genetics in 2013 should be mentioned (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3832271/).

      Added. Our paragraph includes qualifier language about the functionality of the Ubx-related ncRNAs (“are thought to”, “appears to”), and updated references regarding bxd (Petruk et al. 2006; Ibragimov et al. 2023).

      Line 62 authors, should write "Little is known about how Hox genes are regulated outside of Drosophila" and not flies.

      Corrected

      Lines 110-112 could lncRNA:Ubx-IT1 correspond to PS4 antisense reported by Pease et al in 2023 (see URL above)? Lines 115-117, could lncRNA:UbxAS5' correspond to bxd antisense of Pease et al in 2023 (see above)?

      As we could not detect sequence similarities, we preferred to avoid drawing homology, and we intentionally avoided reference to the fly transcripts when we named IT1 and AS5’. This said, we agree it is important to clarify that further studies are needed to clarify this relationship. We elaborated on this point in our discussion:

      "Of note, a systematic in-situ survey (Pease et al. 2013) showed that Drosophila embryos express an antisense transcripts in its 5’ region (lncRNA:bxd), as well as within its first intron (lncRNA:PS4). It is thought that Drosophila bxd regulates Ubx, possibly by transcriptional interference or by facilitation of the Fub-1 boundary effect (Petruk et al. 2006; Ibragimov et al. 2023), while the possible regulatory roles of PS4 remain debated (Hermann et al. 2022). While these dipteran non-coding transcripts lack detectable sequence similarity with the lepidopteran IT1 and AS5’ transcripts, further comparative genomics analyses of the Ubx region across the holometabolan insect phylogeny should clarify the extent to which Hox cluster lncRNAs have been conserved or independently evolved."

      Lines 154-155: "This concordance between Hi-C profiling and CTCF motif prediction thus indicates that Antp-Ubx_BE region functions as an insulator between regulatory domains of Antp and Ubx ». This is only correlative, I would write "suggests" instead of "indicates" and add a "might function".

      Corrected as suggested.

      Line 254, I assume the authors wish to write Ubx-IT1 in V. cardui instead of Ubx-T1.

      Typo corrected

      Line 255 : Fig.5 is absent from the pdf file and replaced by table 1. I did not find a legend for Table 1.

      Corrected, with our sincere apologies for the loss of this image in our first submission.

      Line 293 "Individual with hindwing clones 2.75 times more common than...." "are" is missing?

      Corrected

      Lines 303-313, it is not entirely clear how many guide RNAs were injected. Would be useful to indicate the sites targeted in Fig.S8.

      We specify in the revised text : using a single guide RNA (Ubx11b9)

      Lines 323-337: it is not entirely clear to this referee (a drosophilist) if those spontaneous mutations can be inbred or whether these individuals are occasional mosaics. In general, did anyone try to derive lines from those mosaic animals? Is it possible to hit the germline at the syncitial stages at which the guides are injected? Are the individuals with wing phenotype fertile? Given the fact that the Antp-Ubx_BE mutations should be dominant, I wonder if this characteristic would not help in identifying germline transmission. Similar remark for the discussion where the authors explain at line 360, that genotyping can only be done in the progeny of the Go. I do not have the impression that the authors have performed this genotyping and if I am right, I do not understand why.

      We improved our discussion section on this topic (new text in orange):

      "It is crucial here to highlight the limitations of the method, in order to derive proper insights about the functionality of the regulatory regions we tested. In essence, butterfly CRISPR experiments generate random mutations by non-homologous end joining repair, that are usually deletions (Connahs et al. 2019; Mazo-Vargas et al. 2022; Van Belleghem et al. 2023). Ideally, regulatory CRISPR-induced alleles require genotyping in a second (G1) generation to be properly matched to a phenotype (Mazo-Vargas et al. 2022). Possibly because of lethal effects, we failed to pass G0 mutations to a G1 generation for genotyping, and were thus limited here to mosaic analysis. As adult wings have lost scale building cells that may underlie a given phenotype, we circumvented this issue by genotyping a pupal forewing displaying an homeotic phenotype in the more efficient Antp-Ubx_BE perturbation experiment (Fig. S4). In this case, PCR amplification of a 600 bp fragment followed by Sanger sequencing recovered signatures of indel variants, with mixed chromatograms starting at the targeted sites. But in all other experiments (CRM11, IT1, and AS5’ targets), we did not genotype mutant tissues, as they were only detected in adult stages and generally with small clone sizes. Some of these clones may have been the results of large structural variants, as data from other organisms suggests that Cas9 nuclease targeting can generate larger than expected mutations that evade common genotyping techniques (Shin et al. 2017; Adikusuma et al. 2018; Kosicki et al. 2018; Cullot et al. 2019; Owens et al. 2019). Even under the assumption that such mutations are relatively rare in butterfly embryos, the fact we injected >100 embryos in each experiment makes their occurrence likely (Fig. 9), and we are unable to assign a specific genotype to the homeotic effects we obtained in CRM11, IT1 and AS5’ perturbation assays."

      We agree that the work we conducted with mosaics has important caveats. So far, our attempts at breeding homeotic G0 mutants have not been fruitful at this locus, while less deleterious loci can yield viable alleles into further generations, such as WntA (published) and cortex (in prep.). We prefer to stay vague about negative data here, as it is difficult to disentangle if they were due to real mutational effects (e.g. the alleles can be dominant and lethal in the G1 generation) to failure to germline carriers of mutations as founders, or to health issues that are often amplified by inbreeding depression (including a possible iflavirus in our V. cardui cultures).

      We concur with the prediction that Antp-Ubx_BE mutations are probably dominant, and intend to follow up with similar GOF experiments in the Plodia pantry moth, a laboratory model for lepidopteran functional genomics that is more amenable than butterflies to inbreeding and long-term studies in mutant lines. In our experience (https://www.frontiersin.org/articles/10.3389/fevo.2021.643661/full), Ubx coding knock-out can be more extensive in Plodia than in butterflies, so we think these animals will also be more resilient to the deleterious effects of the GOF phenotype.

      Line 423, 425, I am not a fan of the term "de-insulating!!!!!

      We replaced this neologism by Similar deletion alleles resulting in a TAD fusion and misexpression effect (see below).

      Line 425, why bring the work on Notch while there are so many examples in the BX-C itself....

      Our revised sentence makes it more clear we are referring here to documented examples of deletion-mediated TAD fusion (ie. featuring a conformation capture assay such as HiC/micro-C):

      This suggests a possible loss of the TAD boundary in the crispant clones, resulting in a TAD fusion or in a long-range interaction between a T2-specific enhancer and Ubx promoter. Similar deletion alleles resulting in a TAD fusion and misexpression effect have been described at the Notch locus in Drosophila (Arzate-Mejía et al. 2020), in digit-patterning mutants in mice and humans (Lupiáñez et al. 2015; Anania et al. 2022), or at murine and fly Hox loci depleted of CTCF-mediated regulatory blocking (Narendra et al. 2015; Gambetta and Furlong 2018; Kyrchanova et al. 2020).

      Our revision also includes more emphasis on the Drosophila BX-C boundary elements Fub and Fab-7 (see above).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is very well written, the data are clearly presented and the methodology is robust. I only have suggestions to improve the manuscript, to make the study more appealing or to discuss in more detail some questions raised by the work.

      1. In the study as it stands, PFG seems to come out of the blue. The authors apparently selected this protein based on sequence conservation between species but this is unlikely to be sufficient to identify novel TFs. Explaining in more detail the reasoning that led to PFG would make the story more appealing. Perhaps PFG was identified through a large reverse genetics screening?

      Response: Thank you for your suggestion. We identified this gene solely by the strategy we described in the manuscript. We decided on this strategy based on the findings of our previous study on AP2-Family TFs, whose DNA binding domains are highly conserved among Plasmodium orthologues. Using this screening strategy, we identified a novel AP2 family TF AP2-Z. The results of the present study demonstrated that this strategy is applicable to TFs other than those belonging to the AP2 family. We are aware that this strategy is not all-encompassing. In fact, we failed to identify HDP1 as a candidate TF when it was also in the target list of AP2-G. However, at present, this is our primary strategy for identifying novel TFs in the targetome.

      1. The authors propose that PFG and AP2-FG form a complex, but this is actually not shown. Did they try to document a physical interaction between the two proteins, for example using co-IP?

      Response: Even when the two molecules were identified to be at the same position by ChIPseq, it cannot be concluded that they form a physical complex because it is possible that they competitively occupy the region. However, in this study, we performed ChIP-seq in the absence of PFG and demonstrated that the cAP2-FG peaks disappeared while those of sAP2-FG remained. This result can only be explained by the two proteins forming a complex at this region, which excludes the possibility that AP2-FG binds the region independently.

      1. It is unclear how PFG can bind to DNA in the absence of DNA-binding domain. Did the authors search for unconventional domains in the protein? This should be at least discussed in the manuscript.

      Response: We speculate that the two highly conserved regions, region 1 and region 2, function as DNA-binding domains in PFG. However, this domain is not similar to any DNA binding domains reported thus far. A straightforward way to demonstrate this would be to perform in vitro binding assays using a recombinant protein. However, thus far, we have not succeeded in obtaining soluble recombinant proteins for these regions. We have added the following sentences to the results section.

      “At present, we speculate that PFG directly interacts with genomic DNA through two highly conserved regions; region 1 and region 2. However, these regions are not similar to any DNA binding domains reported thus far. In other apicomplexan orthologues, these two domains are located adjacent to one another in the protein (Fig. 1A). Therefore, these two regions may be separated by a long interval region but constitute a DNA binding domain of PFG as a result of protein folding.”

      1. How do the authors explain that PFG is still expressed in the absence of AP2-FG? Is AP2G alone sufficient to express sufficient levels of the protein? Is PFG down-regulated in the absence of AP2-FG?

      Response: Our previous ChIP-seq data indicate that PFG is a target of AP2-G. According to the study by Kent et al. (2018), this gene is up-regulated in the early period following conditional AP2-G induction. The results of the present study showed that PFG is capable of autoactivation through a transcriptional positive feed-back loop. These results suggest that PFG can maintain its expression to a certain level once activated by AP2-G, even in the absence of AP2-FG. In our previous microarray analysis, significant decreases in PFG expression were not observed in AP2-FG-diaruptedparasites.

      1. How do AP2-FG regulated genes (based on RNAseq) compare with the predicted cAP2FG/sAP2-FG predicted genes (based on ChIPseq)? Are the two subsets included in the genes that are actually down-regulated in AP2-FG(-)?

      Response: Disruption of the AP2-FG gene impairs gametocyte development. We considered that the direct effect of this disruption would be difficult to analyze in gametocyte-enriched blood, in which gametocytes are pooled during sulfadiazine treatment to deplete asexual stages. Therefore, in our previous paper, we performed microarray analysis between WT and KO parasites to detect the direct effect of AP2-FG disruption on target gene expression, using mice which were synchronously infected with parasites. According to our results, 206 genes were down-regulated in AP2-FG-disrupted parasites. Of these genes, 40 and 117 were targets of sAP2-FG and cAP2-FG, respectively. However, it is still possible that a significant proportion of genes were indirectly down-regulated by AP2-FG disruption, which may impair gametocyte development. Moreover, based on the results of the present study, expression of a significant proportion of AP2-FG target genes could be complemented by PFG transcription. We believe that it would be difficult to compare the direct effects of these TFs on gene expression via transcriptome analysis (therefore, targetome analysis is important). In this study, we compared the expression of target genes of sAP2-FG and cAP2FG between PFG(-) and WT parasites. We expected that down-regulation of PFG (cAP2FG) targets would be complemented with transcription by sAP2-FG.

      1. Minor points

      -Page 5 Line 10, remove "as"

      Response: We have corrected this.

      -Page 7 Lines 4-13: is it possible to perform the assay in PFG(-) parasites?

      Response: Thank you for your question. Even when the marker gene expression was decreased in PFG(-) parasites, we cannot conclude the reason to be a direct effect of the mutation. To determine the function of the motif, it is necessary to perform the assay using wild-type parasites.

      -Page 7 Line 45: Fig6C instead of 5C

      Response: Thank you for pointing this out. We have corrected this.

      -Page 8 Line 27: "decreases"

      Response: Thank you for pointing this out. We have corrected this.

      -Page 8 Line 36: PFG instead of PGP

      Response: We have corrected this.

      -Page 8 Line 39: remove "the fact"

      Response: We have removed this word.

      -Page 8 Line 42: Fig6G instead of 5G

      Response: We have corrected this.

      -Page 8 Line 43: PFG instead of PGP

      Response: We have corrected this.

      -Page 9 Line 23: "electroporation"

      Response: We have corrected this.

      -Page 9 Line 32: "BamHI"

      Response: We have corrected this.

      -Fig 2E: in the crosses did the authors check oocyst formation in the mosquito?

      Response: We did not check oocyst formation because abnormalities in males may not affect oocyst formation.

      -Page 17, legend Fig3, Line 14, there is probably an inversion between left and right for PFG versus AP2-FG (either in the legend or in the figure)

      Response: Thank you for pointing this out. PFG peaks are located in the center in both heat maps. The description “AP2-FG peaks” over the arrowhead in the left map was incorrect. We have corrected this to “PFG peaks”. The peaks in the left heat map must be located in the center; thus, this figure might be redundant.

      Reviewer #2 (Recommendations for the Authors):

      • Could the authors please state in the results section that PFG stands for partner of AP2FG.

      Response: Thank you for the comment. We have added the following to the results section:

      “Through this screening, a gene encoding a 2709 amino acid protein with two regions highly conserved among Plasmodium was identified (PBANKA0902300, designated as a partner of AP2-FG (PFG; Fig. 1A).”

      • Given that the transcriptional program is so dynamic, the timing of the ChIP-seq experiments is crucial. Could the authors clarify the timings of the different ChIP-seq experiments (AP2-FG, PFG, PFG in AP2-FG-, AP2-FG in PFG-, ...)

      Response: Thank you for the comment. To deplete any parasites in the asexual stages, all ChIP-seq experiments in this study were performed using blood from mice treated with sulfadiazine, namely, gametocyte-enriched blood. As the reviewer points out, timing is important, and samples from the period when TFs are maximally expressed are optimal for ChIP-seq. However, when parasites in the asexual stages are present, the background becomes higher. Thus we usually use gametocyte-enriched blood for ChIP-seq when expression of the TF is observed in mature gametocytes. The exception was our ChIP-seq analysis of AP2-G, because is not present in mature gametocytes.

      • Fig 4c is an example of great overlap of peaks, but it would be helpful if the authors could quantify the overlaps between experiments (and describe the overlap parameters used).

      Response: According to the comment, we have created a Venn diagram of overlapping peaks (attached below). However, the peaks used for this Venn diagram were selected after peakcalling via fold-enrichment values. Thus, even if the counterpart of a peak is absent in these selected peaks (non-overlapping peaks in the Venn diagram), it does not indicate that it is absent in the original read map. We believe the overlap of peaks would be estimated more correctly in the heat maps.

      Author response image 1.

      Legged: The Venn diagram shows the number of common peaks between these ChIP seq experiments (distance of peak summits < 150

      • Additionally, how were the promoter coordinates used for each gene when they associate ChIP peaks to a gene target. Did the authors choose 1-2kb? Or use a TSS/5utr dataset such as Adjalley 2016 or Chappell 2020?

      Response: We selected a 1.2 Kbp region for target prediction based on our previous studies. As the reviewer pointed out, target prediction using TSS information may be more accurate. However, reliable TSS information is not available for P. berghei to the best of our knowledge.

      The two papers are studies on P. falciparum.

      • In the absence of evidence of physical interaction, it remains unclear if AP2-FG and PFG actually interact directly or as part of the same complex. A more detailed characterisation with IPs/co-IPs followed by mass spectrometry of the GFP-tagged version of PFG in the presence and absence of AP2-FG would be highly informative.

      Response: Thank you for the comment. Even when these two TFs occupy the same genomic region, it cannot be conclusively said that they exist at the same time in the region: they might competitively occupy the region. However, we showed that the cAP2-FG peaks disappear from the region when PFG was disrupted, while sAP2-FG peaks remain. We believe that this is evidence that the two TFs physically interact with each other.

      • It was not clear if the assessment of motif binding using cytometry was performed using all the required controls and compensation. This section should be clarified.

      Response: Thank you for the comment. Condensation was performed using parasites expressing a single fluorescent protein. The results are attached below. The histogram of mCherry using control parasites expressing GFP under the control of the HSP70 promoter is also attached.

      Author response image 2.

      However, we found that descriptions of the filters for detecting red signals were not correct. This assay was performed using parasites which expressed GFP constitutively and mCherry under the control of the p28 promoter. These two fluorescent proteins were excited by independent lasers (488 and 561, respectively), and the emission spectra were detected using independent detectors (through 530/30 and 610/20 filters, respectively). We have revised the description regarding our FACS protocols as follows:

      “Flow cytometric analysis was performed using an LSR-II flow cytometer (BD Biosciences). In experiments using 820 parasites, the tail blood from infected mice was selected via gating with forward scatter and staining with Hoechst 33342 (excitation =355 nm, emission = 450/50). The gated population was then analyzed for GFP fluorescence (excitation = 488 nm, emission = 530/30) and RFP fluorescence (excitation = 561 nm, emission = 610/20). In the promoter assay (using parasites transfected with a centromere plasmid), the tail blood from infected mice was selected via gating with forward scatter and staining with Hoechst 33342 (excitation =355 nm, emission = 450/50), followed by GFP fluorescence (excitation = 488 nm, emission = 530/30). The gated population was analyzed for mCherry fluorescence (excitation = 561 nm, emission = 610/20). Analysis was performed using the DIVER program (BD Biosciences).”

      Minor points:

      • Page 4, line 37: The authors should specify the timing of expression of AP2-FG on the text.

      Response: We have added the following description to the text.

      “The timing of the expression was approximately four hours later than that of AP2-FG, which started at 16 hpi (9).” .

      • Ref 9 and 17 are repeated

      Response: Thank you for pointing this out. We have corrected this.

      • Fig 1D and 1F do not have scale bars

      Response: We have added scale bars to Fig. 1D.

      We have not changed Fig. 1F, because we believe that the scales can be estimated from the size of the erythrocyte.

      • Page 5, line 29-30. Could the authors specify how many and which of the de-regulated genes have a PFG in their promoter.

      Response: Thank you for the comment, As described in a later section (page 7; Impact of PFG disruption on the expression of AP2-FG target genes), among the 279 genes significantly downregulated in PFG(-) parasites, 165 genes were targets for PFG (unique for PFG or common for sAP2-FG and PFG). In contrast, only four genes were targets unique to sAP2-FG. Therefore, 165 genes harbor the upstream peaks of PFG. These genes are shown in Table S1.

      • Fig 5F. in the methods associated with this figure there seems to be a mixup with the description of the lasers. In addition, given the spillover of the red and green signal between detectors this experiment needs compensation parameters. The authors should provide the gating strategy before and after compensation as this is critical for the correct calculation of the number of red parasites. Indeed, the lowest red cloud on the gate shown could be green signal spill over.

      Response: Thank you for the comment. As described above, there were some incorrect descriptions about the conditions of our FACS protocols in the methods section. We have revised them.

      -Page 7, line 19. Could the authors explicitly say in the text that the 810 genes are those with 1 (or more?) PFG peaks in their promoter (out of a total of 1029) to best guide the reader. Additionally, it is important to define the maximum distance allowed between a peak and CDS for it to be associated with said CDS.

      Response: We have revised Table S2 by adding the nearest genes. The revised table shows the relationship between a PFG peak and its nearest genes, together with their distances.

      • Page 7, line 45: fig 6c, not 5c

      Response: Thank you for the comment. We have corrected this.

      • Page 7 last paragraph: This section is very hard to follow. For instance, on line 50 do the authors mean that the sAP2-FG unique targets are LESS de-regulated? On line 51: do the authors mean unique targets of cAP2-FG or unique targets of PFG? Line 53: do the authors mean that genes expressed in the "common" category are LESS de-regulated than the PFG unique targets?

      Response: We are sorry for the lack of clarity; after reviewing the manuscript, it appears to be unclear what the fold change means in this section. Here, fold change means the ratio of PFG(-)/wild type. Thus “High log2(fold change) value” means that the genes were less downregulated. We have revised the description as follows:

      “The log2 distribution (fold change = PFG(-)/wild type) in the three groups of target genes showed that the average value was significantly higher (i.e., less down-regulated) in targets unique to sAP2-FG than in the other two groups (targets unique to cAP2-FG or common targets for both), with p-values of 1.3 × 10-10 and 1.4 × 10-5, respectively, by two-tailed Student’s t-test (Fig. 6F). In addition, the average log2 (fold change) value of the common target genes was relatively higher (i.e., less down-regulated) than that of targets unique to PFG, suggesting that transcriptional activation by sAP2-FG partly complements the impact of PFG disruption on these common targets.”

      • Page 8, line 42: Fig 6G, not 5G

      Response: Thank you for pointing this out. We have corrected this.

      Reviewer #3 (Recommendations For The Authors):

      1. The gene at the center of this study (PBANKA_0902300) was identified in an earlier genetic screen by Russell et al. as being a female specific gene with essential role in transmission and named Fd2 (for female-defective 2). Since this name entered the literature first and is equally descriptive, the Fd2 name should be used instead of PFG to maintain clarity and avoid unnecessary confusion. Surprisingly, this study is neither cited nor acknowledged despite a preprint having been available since August of 2021. This should be remedied.

      Response: Thank you for the comment. We have added the paper by Russell et al. accordingly and mentioned the name FD2 in the revised manuscript. However, we have retained the use of PFG throughout the paper. We believe that this usage of PFG shouldn’t be confusing, as FD2 has only been used in one previous paper. We have added the following:

      “Through this screening, a gene encoding a 2709 amino acid protein with two regions highly conserved among Plasmodium was identified (PBANKA0902300, designated as a partner of AP2-FG (PFG; Fig. 1A). This gene is one of the P. berghei genes that were previously identified as genes involved in female gametocyte development (named FD2), based on mass screening combined with single cell RNA-seq (ref).”

      1. While it isn't really important how the authors came to arrive at studying the function of Fd2, the rationale/approach given in the first paragraph of the result section seems far too broad to lead to Fd2, given that it lacks identifiable domains and many other ortholog sets exist across these species.

      Response: We selected this gene from the list of AP2-G targets as a candidate for a sequence-specific TF based on the hypothesis that the amino acid sequences of DNAbinding domains are highly conserved. We successfully identified two TFs (including PFG) using this method. However, there may be TFs that do not fit this hypothesis which are also targets of AP2-G. In fact, we were unable to identify HDP1 as a TF candidate, despite being a AP2-G target.

      1. Fig. 1A-C: Gene IDs for the orthologs should be provided, as well as the methodology for generating the alignments.

      Response; We have added the gene IDs and method for alignment in the legend as follows:

      (A) Schematic diagram of PFG from P. berghei and its homologs in apicomplexan parasites. Regions homologous to Regions 1 and 2, which are highly conserved among Plasmodium species, are shown as yellow and blue rectangles, respectively. Nuclear localization signals were predicted using the cNLS mapper (http://nls-10 mapper.iab.keio.ac.jp/cgibin/NLS_Mapper_form.cgi). The gene IDs of P. berghei PFG, P. falciparum PFG, and their homologs in Toxoplasma gondii, Eimeria tenella and Vitrella brassicaformis are PBANKA_0902300, PF3D7_1146800, TGGT1_239670, ETH2_1252400, and Vbra_10234, respectively.

      (C) The amino acid sequences of Regions 1 and 2 from P. berghei PFG and its homologs from other apicomplexan parasites in (A) were aligned using the ClustalW program in MEGA X. The positions at which all these sequences have identical amino acids are indicated by two asterisks, and positions with amino acid residues possessing the same properties are indicated by one asterisk.

      1. Figure 2: The Phenotype of Fd2 knockout should be characterized more comprehensively.

      It remains unclear whether ∆Fd2 parasite generate the same number of females but these are defective upon fertilization or whether there is also a decrease in the number of female gametocytes. Is the defect just post-fertilization and zygotes lyse or are there fewer fertilization events? If so is activation of female GCs effected?

      The number of male and female gametocytes should be quantified using sex-specific markers not affected by Fd2 knockout rather than providing a single image of each. The ability of ∆Fd2 GCs should also be evaluated.

      This is also important for the interpretation of Fig 2G. Is the down-regulation of the genes due to fewer female GCs or are the down-regulated genes only a subset of female-specific genes.

      Response: In PFG(-) parasites, the rate of conversion into zygotes of female gametocytes decreased, and zygotes had lost capacity for developing into ookinetes. This indicates that gametocyte development (i.e., the ability to egress the erythrocyte and to fertilize) and zygote development were both impaired. This phenotype is consistent with the observation that genes expressed in female gametocytes are broadly downregulated. PFG is a TF, and its disruption led to decreased expression of hundreds of female genes. Thus, the observed phenotype may be derived from combined decreased expression of these genes. We believe further detailed phenotypic analyses will not generate much novel information on this TF. Instead, RNA-seq data in PFG(-) parasites and the targetome have promise in helping to characterize the functions of this TF.

      1. Figure 3: what fraction of down-regulated genes have the Fd2 10mer motif?

      Response: Thank you for the question. We investigated the upstream binding motifs of these genes. Of the 279 significantly down-regulated genes (containing 165 targets), 161 genes harbor the motif (including nine-base motifs that lack one lateral base which is likely not essential for binding) in their upstream regions (within 1,200 bp from the first methionine codon). However, this result has not been described in the revised manuscript because it is more important whether these regions harbor PFG peaks (upstream motifs can exist without being involved in the binding of PFG).

      1. sAP2-FG (single) vs cAP2-FG (complex) nomenclature is confusing and possibly misleading since few TFs function in isolation and sAP2-FG likely functions in a complex that doesn't contain Fd2, possibly with another DNA binding protein that binds the TGCACA hexamer. The name for the distinct peaks should refer to the presence or absence of Fd2 in the complex, or maybe simply refer to them as complex A & B.

      Response: As shown in the DIP-seq analysis results, AP2-FG can bind the motif by itself. In contrast, AP2-FG must form a complex with PFG to bind to the ten-base motif. The complex and single forms are named according to this difference (the presence or absence of PFG) and used solely in its relation with PFG. We wrote “In the following, we refer to the form with PFG as cAP2-FG or the complex form, and the form without PFG as sAP2-FG or the single form.” We believe that the nomenclature has sufficient clarity. However, we have partially (underlined) revised certain sentences in the discussion section as follows.

      “As the expression of PFG increases via this mechanism, AP2-FG recruited by PFG (cAP2FG) increases and eventually becomes predominant in the transcriptional regulation of female gametocytes.”

      “This suggests that the promoter of the CCP2 gene, which is a target of PFG only, is still active in AP2-FG(-)820 parasites.”

      We recently reported that the TGCACA motif is a cis-activation motif in early gametocytes and important for both male and female gametocyte development. Thus we speculate that sAP2-FG is not involved in cis-activation by the TGCACA motif. The p-value of the six-base motif is indeed comparable to that of the five-base motif. However, the pvalue (calculated by Fisher’s exact test) in six-base motifs tend to be lower than that calculated in five-base motifs, because the population is much large. We speculate that there is a sequence-specific TF that may be expressed in early gametocytes and bind this motif, independently of AP2-FG.

      1. I compared the overlap of peaks in the 4 ChIP-seq data sets:

      90% of the Fd2 peaks are shared with AP2-FG (binding 24% of shared peaks is lost in ∆AP2FG)

      10% are bound by Fd2 alone (binding at 35% of Fd2 is lost in ∆AP2-FG)

      75% of Fd2 peaks are bound independently of AP2-FG

      47% of AP2-FG peaks shared with Fd2 (binding at 71% of shared peaks is lost in ∆Fd2) 53% of AP2-FG peaks are bound only by AP2-FG (but binding at 82% of AP2-FG only peaks is still lost in the ∆Fd2)

      Binding at 78% of all AP2-FG peaks is lost in ∆Fd2

      This indicates that much of AP2-FG binding in regions even in regions devoid of Fd2 still depends on Fd2. What are possible explanations for this?

      https://elife-rp.msubmit.net/eliferp_files/2023/04/03/00117573/00/117573_0_attach_10_17936_convrt.pdf

      Response: In the ChIP-seq of AP2-FG in the absence of PFG, 441 peaks are still called. This means that at least 441 binding sites for AP2-FG independent of PFG exist. This is a straightforward conclusion from our ChIP-seq data. On the other hand, simple deduction of peaks between two ChIP-seq experiments (AP2-FG peaks minus PFG peaks) is not a precise method for determining sAP2-FG. Peak-calling is independently performed in each ChIP-seq experiment. Thus, peaks remaining after the deduction between two experiments can still contain peaks that are actually common, but which are differentially picked up through the process of peak calling. Even when using data obtained by the same ChIP-seq experiment, markedly different numbers of peaks are called according to the conditions for peak calling (in contrast, common peaks between two independent experiments increase the reliability of the data). If wanting to identify sAP2-FG peaks via comparisons between AP2-FG peaks and PFG peaks, the reviewer has to increase the number of PFG peaks by reducing the peak-calling threshold until the number of overlapping peaks between AP2-FG and PFG are saturated, and then deduce the overlapping peaks from the AP2-FG peaks. However, as described above, for the purposes of estimating the number of sAP2-FG, it would be better to perform ChIP-seq of AP2-FG in the absence of PFG.

      1. Possible explanations of why recombinant Fd2 doesn't bind the TGCACA hexamer. It would also be good to note that the GCTCA AP2-FG motif found in Fig4G is now perfect match for the motif identified by protein binding microarray in Campbell et al.

      Response: It is not known what sequence recombinant PFG binds. The TGCACA motif is not enriched in PFG peaks. If the reviewer is referring to AP2-FG, our findings that the recombinant AP2 domain binds the five-base motif strongly suggests that other TFs recognize this motif. As described in our response to comment 9, we recently reported that TGCACA is a cis-activating sequence important for the normal development of both male and female gametocytes. Therefore, we currently speculate that this motif is a binding motif of other TFs and is independent of AP2-FG.

      We have mentioned the protein binding microarray data in the Results section as follows.

      “The most enriched motif matched well with the binding sequence of the AP2 domain of P. falciparum AP2-FG, which was reported by Campbell et al.”

      1. What might explain the strong enrichment for TGCACA in ChIPseq but when pulled down by AP2-FG DBD: another binding partner? requires more of AP2-DF than just DBD?

      Response: As described above in our response to comment 6, we have recently submitted a preprint studying the roles of the remodeler subunit PbARID in gametocyte development. We reported that the remodeler subunit is recruited to the six-base motif and that the motif is a novel cis-activation element for early gametocyte development. We speculate that a proportion of AP2-FG targets are also targets of a TF that recognizes this motif and recruits the remodeler subunit. These two TFs may be involved in the regulation of early gametocyte genes but function independently.

      1. Calling DNA pulldown with recombinant AP2-FG DNA-binding domain DNAImmunoprecipitation sequencing (DIP-seq) is confusing since there are no antibodies involved. Describing it directly as a pulldown of fragmented DNA will be clearer to the reader.

      Response: Thank you for the comment. We have also recognized this discrepancy. However we called the method DIP-seq because the original paper reporting this method used this name, wherein it did not use antibodies to capture the MBP-fusion recombinant protein. Our experiment was performed using essentially the same methods, and thus we retained the name.

      1. The legends and methods are very sparse and should include substantially more detail.

      Response: Thank you for the comment. We have revised the description of the FACS experimental method for clarity.

      1. BigWig files for all ChIPseq enrichment used for analysis in this study need to be provided.

      (two replicates each of : Fd2 in WT, Fd2 in ∆AP2-GF, AP2-FG in WT, AP2-FG in ∆Fd2)

      Response: We have deposited the BigWig files to GEO (GSE.226028 and GSE114096).

      1. Tables of ChIP data need to have both summits and peaks and need to list nearest gene. Also the ChIPseq peaks for Fd2 are surprisingly broad (ChIP peaks are very large, e.g. 68% of Fd2 peaks (dataset2) are greater than 1000kb) give its specificity for a long motif. Why is this?

      Response: We have revised Table S2 to include the nearest genes. We are unsure why peaks in the over 1000-bp peak region exist in such high proportions. However, this proportion was also high in our previous ChIP-seq data. Therefore, we speculate that this is a tendency of peak-calling by MACS2. We did not use these values in this paper. For example, targets were predicted using peak summits, and binding motifs were calculated using the 100-base regions around peak summits.

      1. Figure 5E: The positions of the 10mer and 5mer motifs in the promoter should be indicated as well as the length of the promoter. Moreover, mutation of just the 5bp motifs would be valuable to understand if 10mer is sufficient for expression of the reporter.

      Response: Thank you for the comment. We have revised the figure accordingly. The majority of female-specific promoters only harbor ten-base motifs. Thus the ten-base motif is sufficient for evaluating reporter activity (i.e., it would function without five-base motifs).

      1. How is AP2-FG expression affected in ∆Fd2 and vice versa?

      Response: According to our previous microarray data, PFG expression was not significantly downregulated by disruption of AP2-FG. This may be because PFG transcriptionally activates itself through a positive feedback loop after being induced by AP2-G. Similarly, according to our present study, AP2-FG expression was not downregulated by PFG disruption. This may be because AP2-FG is transcriptionally activated by AP2-G.

      1. The single cell data in Russell et al. could easily be used to indicate the order of expression.

      Response: Determining the expression order of gametocyte TFs via the single cell RNA-seq data from Russel et al. is difficult, because only a small number of parasite cells were considered to be in the early gametocyte stage in this study. This is because the parasites were cultured for 24h before the analysis. The analysis suggested by the reviewer may be possible via single cell RNA-seq, but the experiments must be performed with more focus on the early gametocyte stage.

      1. A discussion of the implication of P. falciparum transmission would be appreciated.

      Response: Thank you for the comment. We have added the following to the Discussion section:

      “P. falciparum gametocytes require 9-12 days to mature, which is much longer than that of P. berghei. Meanwhile, it has been reported that the ten-base motif is highly enriched in the upstream regions of female-specific genes also in P. falciparum. Thus, despite the difference in maturation periods, PFG is likely to play an important role in the transcriptional regulation of female P. falciparum gametocyte development."

      1. The lack of identifiable DNA binding domains in Fd2 is intriguing given the strong sequence-specificity. Do the authors think they have identified a new DNA-binding fold ?

      Alphafold of the orthologs with contiguous regions 1&2 might offer insight.

      Response: We speculate that these regions function as DNA binding domains. We performed analysis using Alfafold2 according to the comment. However, the predicted structure of the region was not similar to any other canonical DNA-binding domains. Thus, it may be a novel DNA-binding fold as the reviewer mentioned. Further studies such as binding assays using recombinant proteins would be necessary to confirm this, but thus far we have not successfully obtained the soluble proteins of these regions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study advances the understanding of physiological mechanisms in deep-sea Planctomycetes bacteria, revealing unique characteristics such as the only known Phycisphaerae using a budding mode of division, extensive involvement in nitrate assimilation and release phage particles without cell death. The study uses convincing evidence, based on experiments using growth assays, phylogenetics, transcriptomics, and gene expression data. The work will be of interest to bacteriologists and microbiologists in general.

      Response: Thanks for the Editor’s and Reviewers’ positive comments, which help us improve the quality of our manuscript entitled “Physiological and metabolic insights into the first cultured anaerobic representative of deep-sea Planctomycetes bacteria” (paper#eLife-RP-RA-2023-89874). The comments are all valuable, and we have studied the comments carefully and have made corresponding revisions according to the suggestions. Revised portions are marked in blue in the modified manuscript.

      Please find the detailed responses as following.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors of the manuscript cultivated a Planctomycetes strain affiliated with Phycisphaerae. The strain was one of the few Planctomycetes from deep-sea environments and demonstrated several unique characteristics, such as being the only known Phycisphaerae using a budding mode of division, extensive involvement in nitrate assimilation, and being able to release phage particles without cell death. The manuscript is generally well-written. However, a few issues need to be more clearly addressed, especially regarding the identification and characterization of the phage.

      Response: Thanks for your positive comments. Please find the detailed responses as following.

      Reviewer #1 (Recommendations For The Authors):

      • Line 75-77, add a reference for this statement.

      Response: Thanks for your suggestion. We have added a reference (Fuerst and Sagulenko, 2011) for this statement in the revised manuscript (Line 77).

      References related to this response:

      Fuerst, J.A., and Sagulenko, E. Beyond the bacterium: planctomycetes challenge our concepts of microbial structure and function. Nat Rev Microbiol. 2011;9:403-413.

      • Line 124-134, add key statistics (such as ANI) of strain ZRK32 and KS4 to this section.

      Response: Thanks for your suggestion. We added the key statistics of strain ZRK32 and KS4, and described as “Based on the 16S rRNA sequence of strain ZRK32, a sequence similarity calculation using the NCBI server indicated that the closest relatives of strain ZRK32 were Poriferisphaera corsica KS4T (98.06%), Algisphaera agarilytica 06SJR6-2T (88.04%), Phycisphaera mikurensis NBRC 102666T (85.28%), and Tepidisphaera mucosa 2842T (82.94%). Recently, the taxonomic threshold for species based on 16S rRNA gene sequence identity value was 98.65% (Kim et al., 2014). Based on these criteria, we proposed that strain ZRK32 might be a novel representative of the genus Poriferisphaera. In addition, to clarify the phylogenetic position of strain ZRK32, the genome relatedness values were calculated by the average nucleotide identity (ANI), the tetranucleotide signatures (Tetra), and in silico DNA-DNA similarity (isDDH), against the genomes of strains ZRK32 and KS4. The ANIb, ANIm, Tetra, and isDDH values were 72.89%, 85.34%, 0.97385, and 20.90%, respectively (Table S1). These results together demonstrated the strain ZRK32 genome to be obviously below established ‘cut-off’ values (ANIb: 95%, ANIm: 95%, Tetra: 0.99, isDDH: 70%) for defining bacterial species, suggesting strain ZRK32 represents a novel strain within the genus Poriferisphaera.” in the revised manuscript (Lines 124-139).

      • Fig. 2A missing description for figure key.

      Response: Thanks for your comments. We modified the Figure 2A, shown as below:

      Author response image 1.

      Figure. 2. Growth assay and transcriptomic analysis of P. heterotrophicis ZRK32 strains cultivated in basal medium and rich medium.

      • Regarding the page released, could this be a membrane vesicle-engulfed phage? I would recommend checking "Spontaneous Prophage Induction Contributes to the Production of Membrane Vesicles by the Gram-Positive Bacterium Lacticaseibacillus casei BL23" and "Chronic Release of Tailless Phage Particles from Lactococcus lactis" for further references.

      Response: Thanks for your valuable comments. We carefully read these two papers and found that phage ZRK32 is most likely a membrane vesicle-engulfed phage. We added the corresponding description as “Moreover, it has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic mechanism (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Considering that strain ZRK32 has a large number of membrane vesicles during cell growth (Figure S9), we speculated that Phage-ZRK32 might be a membrane vesicle-engulfed phage and its release should be related to membrane vesicles.” in the revised manuscript (Lines 381-388).

      References related to this response:

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. Chronic release of tailless phage particles from Lactococcus lactis. Appl Environ Microbiol. 2022; 88: e0148321.

      Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio. 2022;13:e0237522.

      • How were the reference sequences for Fig. S10-S13 retrieved, was it by blasting the phage gene against the entire NCBI database, or only the virus sequence within the NCBI? Please clarify this.

      Response: Thanks for your comments. The reference sequences for Fig. S10-S13 were retrieved by blasting the phage gene against the entire NCBI database. We clarified this as “The reference sequences of four AMGs encoding amidoligase, glutamine amidotransferase, gamma-glutamylcyclotransferase, and glutathione synthase were retrieved by blasting the phage gene against the entire NCBI database, respectively.” in the revised manuscript (Lines 444-447).

      Reviewer #2 (Public Review):

      Summary:

      Planctomycetes encompass a group of bacteria with unique biological traits, the compartmentalized cells make them appear to be organisms in between prokaryotes and eukaryotes. However, only a few of the Planctomycetes bacteria are cultured thus far, and this hampers insight into the biological traits of these evolutionarily important organisms. This work reports the methodology details of how to isolate the deep-sea bacteria that could be recalcitrant to laboratory cultivation, and further reveals the distinct characteristics of the new species of a deep-sea Planctomycetes bacterium, such as the chronic phage release without breaking the host and promote the host and related bacteria in nitrogen utilization. Therefore, the finding of this work is of importance in extending our knowledge of bacteria.

      Response: Thanks for your positive comments.

      Strengths:

      Through the combination of microscopic, physiological, genomics, and molecular biological approaches, this reports the isolation and comprehensive investigation of the first anaerobic representative of the deep-sea Planctomycetes bacterium, in particular in that of the budding division, and release phage without lysis of the cells. Most of the results and conclusions are supported by the experimental evidence.

      Response: Thanks for your positive comments.

      Weaknesses:

      1. While EMP glycolysis is predicted to be involved in energy conservation, no experimental evidence indicated any sugar utilization by the bacterium.

      Response: Thanks for your comments. We have previously tested the sugar utilization of strain ZRK32, and now added this description as “Consistent with the presence of EMP glycolysis pathway in strain ZRK32, we found that it could use a variety of sugars including glucose, maltose, fructose, isomaltose, galactose, D-mannose, and rhamnose (Table S2).” in the revised manuscript (Lines 281-284).

      1. "anaerobic representative" is indicated in the Title, the contrary, TCA in energy metabolism is predicted by the bacterium.

      Response: Thanks for your valuable comments. Currently, anaerobic microorganisms can use other alternative electron acceptors (such as sulfate reducers, nitrate reducers, iron reducers, etc) in place of oxygen for the TCA cycle. For example, Proteus mirabilis uses the whole oxidative TCA cycle without using oxygen as the final electron acceptor when it performs multicellular swarming (Alteri et al., 2012). In this study, all the genes involved in the TCA cycle were present in anaerobic strain ZRK32 and most of them are upregulated, thus we speculate that it might function through the complete TCA metabolic pathway to obtain energy. We added the related description as “Notably, when growing in the rich medium, the expressions of most genes involved in the TCA cycle and EMP glycolysis pathway in strain ZRK32 were upregulated (Figure 2B-D, Figure S5B and Figure S6), suggesting that strain ZRK32 might function through the complete TCA metabolic pathway and EMP glycolysis pathway to obtain energy for growth (Figure S8) (Zheng et al., 2021b). Consistent with the presence of EMP glycolysis pathway in strain ZRK32, we found that it could use a variety of sugars including glucose, maltose, fructose, isomaltose, galactose, D-mannose, and rhamnose (Table S2). As for the presence of TCA cycle in the anaerobic strain ZRK32, we propose that it might use other alternative electron acceptors (such as sulfate reducers, nitrate reducers, iron reducers, etc) in place of oxygen for the TCA cycle, as shown in other anaerobic bacteria (Alteri et al., 2012).” in the revised manuscript (Lines 277-287).

      References related to this response:

      Alteri CJ, Himpsl SD, Engstrom MD, Mobley HL. Anaerobic respiration using a complete oxidative TCA cycle drives multicellular swarming in Proteus mirabilis. mBio. 2012; 3(6): e00365-12.

      1. The possible mechanisms of the chronic phage release without breaking the host are not discussed.

      Response: Thanks for your valuable comments. The possible mechanism of the chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic mechanism. We added the corresponding description as “Moreover, it has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic mechanism (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Considering that strain ZRK32 has a large number of membrane vesicles during cell growth (Figure S9), we speculated that Phage-ZRK32 might be a membrane vesicle-engulfed phage and its release should be related to membrane vesicles.” in the revised manuscript (Lines 381-388).

      References related to this response:

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. Chronic release of tailless phage particles from Lactococcus lactis. Appl Environ Microbiol. 2022; 88: e0148321. da Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio. 2022;13:e0237522.

      Reviewer #2 (Recommendations For The Authors):

      • Have you tested whether strain ZRK32 uses any sugars? If not, why it uses EMP pathway to obtain energy?

      Response: Thanks for your comments. We have previously tested the sugar utilization of strain ZRK32, and now added this description as “Consistent with the presence of EMP glycolysis pathway in strain ZRK32, we found that it could use a variety of sugars including glucose, maltose, fructose, isomaltose, galactose, D-mannose, and rhamnose (Table S2).” in the revised manuscript (Lines 281-284).

      • Further discussion on possible mechanisms of the chronic phage release without breaking the host is expected.

      Response: Thanks for your valuable comments. The possible mechanism of the chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic mechanism. We added the corresponding description as “Moreover, it has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic mechanism (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Considering that strain ZRK32 has a large number of membrane vesicles during cell growth (Figure S9), we speculated that Phage-ZRK32 might be a membrane vesicle-engulfed phage and its release should be related to membrane vesicles.” in the revised manuscript (Lines 381-388).

      References related to this response:

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. Chronic release of tailless phage particles from Lactococcus lactis. Appl Environ Microbiol. 2022; 88: e0148321.

      da Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio. 2022;13:e0237522.

      • It is recommended that the writing is improved, including presentation style and grammar.

      Response: Thanks for your comments. We have invited an English native speaker (Dr. Diana Walsh from Life Science Editors, USA) to revise our manuscript, which we hope to meet your approval.

    1. Author Response

      We are delighted that eLife has assessed our study as a valuable contribution as well as appreciating the importance of working on asymptomatic reservoirs of P. falciparum in high transmission where not just children, but adolescents and adults harbor multiclonal infections. The constructive public reviews will serve to improve our manuscript.

      Detailed responses to referees’ comments and a revised manuscript are forthcoming. Here we make a provisional response to three key areas addressed by the referees:

      (1) census population size

      Referee 1 raises important questions although we respectfully disagree on the terminology we have adopted (of “census”) and on the unclear utility of the proposed quantity.

      We consider the quantity a census in that it is a total enumeration or count of the infections in a given population sample and over a given time period. In this sense, it gives us a tangible notion of the size of the parasite population, in an ecological sense, distinct from the formal effective population size used in population genetics. Given the low overlap between var repertoires of parasites (as observed in monoclonal infections), the population size we have calculated translates to a diversity of strains or repertoires. But our focus here is in a measure of population size itself. The distinction between population size in terms of infection counts and effective population size from population genetics has been made before for pathogens (see for example Bedford et al. 2011 for the seasonal influenza virus and for the measles virus) and is a clear one in the ecological literature for non-pathogen populations (Palstra et al. 2012).

      Both referees 1 and 2 point out that census population size will be sensitive to sample size. We completely agree with the dependence of our quantity on sample size. We used it for comparisons across time of samples of the same depth, to describe the large population size characteristic of high transmission, and persistent across the IRS intervention. Of course, one would like to be able to use this notion across studies that differ in sampling depth.

      Here, referee 1 makes an insightful and useful suggestion. It is true that we can use mean MOI, and indeed there is a simple map between our population size and mean MOI (as we just need to divide or multiply by sample size). We can do even more, as with mean MOI we can presumably extrapolate to the full sample size of the host population, or the population size of another sample in another location. What is needed for this purpose is a stable mean MOI relative to sample size. We can show that indeed in our study mean MOI is stable in that way, by subsampling to different depths of our original sample. We will include in the revision discussion of this point and result, which allows an extrapolation of the census population size to the whole population of hosts in the local area. We’ll also clarify the time denominator, as given the typical duration of infections, we expect our population size to be representative of a per-generation measure.

      Referee 2 suggests we adopt the term “census count” but as a census in our mind is a count we prefer to use “census”.

      Referee 3 considers the genetic data tracking parasite MOI and census changes gives the same result as prevalence which tracks infected hosts. Respectfully, we disagree and will provide an expanded response.

      (2) the importance of lineages (in response to referee 2)

      We do not think that lineages moving exclusively through a given type of host or “patch” is a requirement for enumerating the size of the total infections in such a subset. It is true that what we have is a single parasite population, but we are enumerating for the season the respective size in host classes (children and adults). This is akin to enumerating subsets of a population in ecological settings.

      We are also not clear on the concept of lineage for these highly recombinant parasites as we struggle to find highly related repertoires. In fact, we see the use of the var fingerprinting methodology as a means to capture changes in strain or var repertoires dynamics as a result of changing transmission conditions.

      (3) var methodology

      Comments and queries were made by all three referees about aspects of var methodology, including the Bayesian approach. These will be addressed in our full response.

      Here we respond to a very good point made by referee 2: “Thinking about the applicability of this approach to other studies, I would be interested in a larger treatment of how overlapping DBLa repertoires would impact MOIvar estimates. Is there a definable upper bound above which the method is unreliable? Alternatively, can repertoire overlap be incorporated into the MOI estimator?”

      There is no predefined threshold one can present a priori. Intuitively, the approach to estimate MOI would appear to breakdown as overlap moves away from extremely low, and therefore, for locations with lower transmission intensity. Interestingly, we have observed that this is not the case in our paper by Labbé et al. 2023 where we used model simulations in a gradient of three transmission intensities, from high to low. The original varcoding method performed well across the gradient. This may arise from a nonlinear and fast transition from low overlap to high overlap that is accompanied by the MOI transitioning quickly from primarily multiclonal (MOI > 1) to monoclonal (MOI = 1). This issue needs to be investigated further, including ways to extend the estimation to explicitly include the distribution of DBL repertoire overlap.

      References: Bedford T, Cobey S, Pascual, M. 2011. Strength and tempo of selection revealed in viral gene genealogies. BMC Evol Biol 11, 220. https://doi.org/10.1186/1471-2148-11-220

      Labbé F, He Q, Zhan Q, Tiedje KE, Argyropoulos DC, Tan MH, Ghansah A, Day KP, Pascual M. 2023. Neutral vs . non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19 :e1010816. doi:doi.org/10.1101/2022.06.27.49780

      Palstra FP, Fraser DJ. 2012. Effective/census population size ratio estimation: a compendium and appraisal. Ecol Evol. Sep;2(9):2357-65. doi:10.1002/ece3.329.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This research advance arctile describes a valuable image analysis method to identify individual cells (neurons) within a population of fluorescently labeled cells in the nematode C. elegans. The findings are solid and the method succeeds to identify cells with high precision. The method will be valuable to the C. elegans research community.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the authors developed an image analysis pipeline to automatically identify individual neurons within a population of fluorescently tagged neurons. This application is optimized to deal with multi-cell analysis and builds on a previous software version, developed by the same team, to resolve individual neurons from whole-brain imaging stacks. Using advanced statistical approaches and several heuristics tailored for C. elegans anatomy, the method successfully identifies individual neurons with a fairly high accuracy. Thus, while specific to C. elegans, this method can become instrumental for a variety of research directions such as in-vivo single-cell gene expression analysis and calcium-based neural activity studies.

      The analysis procedure depends on the availability of an accurate atlas that serves as a reference map for neural positions. Thus, when imaging a new reporter line without fair prior knowledge of the tagged cells, such an atlas may be very difficult to construct. Moreover, usage of available reference atlases, constructed based on other databases, is not very helpful (as shown by the authors in Fig 3), so for each new reporter line a de-novo atlas needs to be constructed.

      We thank the reviewer for pointing out a place where we can use some clarification. While in principle that every new reporter line would need fair prior knowledge, atlases are either already available or not difficult to construct. If one can make the assumption that the anatomy of a particular line is similar to existing atlases (Yemini 2021,Nejatbakhsh 2023,Toyoshima 2020), the cell ID can be immediately performed. Even in the case that one suspects the anatomy may have changes from existing atlases (e.g. in the case of examining mutants), existing atlases can serve as a starting point to provide a draft ID, which facilitates manual annotation. Once manual annotations on ~5 animals are available as we have shown in this work (which is a manageable number in practice), this new dataset can be used to build an updated atlas that can be used for future inferences. We have added this discussion in the manuscript: “If one determines that the anatomy of a particular animal strain is substantially different from existing atlases, new atlases can be easily constructed using existing atlases as starting points.” (page 18).

      I have a few comments that may help to better understand the potential of the tool to become handy.

      1. I wonder the degree by which strain mosaicism affects the analysis (Figs 1-4) as it was performed on a non-integrated reporter strain. As stated, for constructing the reference atlas, the authors used worms in which they could identify the complete set of tagged neurons. But how senstiive is the analysis when assaying worms with different levels of mosaicism? Are the results shown in the paper stem from animals with a full neural set expression? Could the authors add results for which the assayed worms show partial expression where only 80%, 70%, 50% of the cells population are observed, and how this will affect idenfication accuracy? This may be important as many non-integrated reporter lines show high mosaic patterns and may therefore not be suitable for using this analytic method. In that sense, could the authors describe the mosaic degree of their line used for validating the method.

      We appreciate the reviewer for this comment. We want to clarify that most of the worms used in the construction of the atlas are indeed affected by mosaicism and thus do not express the full set of candidate neurons. We have added such a plot as requested (Figure 3 – figure supplement 2, copied below). Our data show that there is no correlation between the fraction of cells expressed in a worm and neuron ID correspondence. We agree with the reviewer this additional insight may be helpful; we have modified the text to include this discussion: “Note that we observed no correlation between the degree of mosaicism and neuron ID correspondence (Figure 3- figure supplement 2).” (page 10).

      Author response image 1.

      No correlation between the degree of mosaicism (fraction of cells expressed in the worm) and neuron ID correspondence.

      1. For the gene expression analysis (Fig 5), where was the intensity of the GFP extracted from? As it has no nuclear tag, the protein should be cytoplasmic (as seen in Fig 5a), but in Fig 5c it is shown as if the region of interest to extract fluorescence was nuclear. If fluorescence was indeed extracted from the cytoplasm, then it will be helpful to include in the software and in the results description how this was done, as a huge hurdle in dissecting such multi-cell images is avoiding crossreads between adjacent/intersecting neurons.

      For this work, we used nuclear-localized RFP co-expressed in the animal, and the GFP intensities were extracted from the same region RFP intensities were extracted. If cytosolic reporters are used, one would imagine a membrane label would be necessary to discern the border of the cells. We clarified our reagents and approach in the text: “The segmentation was done on the nuclear-localized mCherry signals, and GFP intensities were extracted from the same region.” (page21).

      1. In the same mater: In the methods, it is specified that the strain expressing GCAMP was also used in the gene expression analysis shown in Figure 5. But the calcium indicator may show transient intensities depending on spontaneous neural activity during the imaging. This will introduce a significant variability that may affect the expression correlation analysis as depicted in Figure 5.

      We apologize for the error in text. The strain used in the gene expression analysis did not express GCaMP. We did not analyze GCaMP expression in figure 5. We have corrected the error in the methods.

      Reviewer #2 (Public Review):

      The authors succeed in generalizing the pre-alignment procedure for their cell idenfication method to allow it to work effectively on data with only small subsets of cells labeled. They convincingly show that their extension accurately identifies head angle, based on finding auto fluorescent tissue and looking for a symmetric l/r axis. They demonstrate that the method works to identify known subsets of neurons with varying accuracy depending on the nature of underlying atlas data. Their approach should be a useful one for researchers wishing to identify subsets of head neurons in C. elegans, for example in whole brain recording, and the ideas might be useful elsewhere.

      The authors also strive to give some general insights on what makes a good atlas. It is interesting and valuable to see (at least for this specific set of neurons) that 5-10 ideal examples are sufficient. However, some critical details would help in understanding how far their insights generalize. I believe the set of neurons in each atlas version are matched to the known set of cells in the sparse neuronal marker, however this critical detail isn't explicitly stated anywhere I can see.

      This is an important point. We have made text modifications to make it clear to the readers that for all atlases, the number of entities (candidate list) was kept consistent as listed in the methods. In the results section under “CRF_ID 2.0 for automatic cell annotation in multi-cell images,” we added the following sentence: “Note that a truncated candidate list can be used for subse-tspecific cell ID if the neuronal expression is known” (page 3). In the methods section, we added the following sentence: “For multi-cell neuron predictions on the glr-1 strain, a truncated atlas containing only the above 37 neurons was used to exclude neuron candidates that are irrelevant for prediction” (Page 20).

      In addition, it is stated that some neuron positions are missing in the neuropal data and replaced with the (single) position available from the open worm atlas. It should be stated how many neurons are missing and replaced in this way (providing weaker information).

      We modified the text in the result section as follows: “Eight out of 37 candidate neurons are missing in the neuroPAL atlas, which means 40% of the pairwise relationships of neurons expressing the glr-1p::NLS-mcherry transgene were not augmented with the NeuroPAL data but were assigned the default values from the OpenWorm atlas” (page 10).

      It also is not explicitly stated that the putative identities for the uncertain cells (designated with Greek letters) are used to sample the neuropal data. Large numbers of openworm single positions or if uncertain cells are misidentified forcing alignment against the positions of nearby but different cells would both handicap the neuropal atlas relative to the matched florescence atlas. This is an important question since sufficient performance from an ideal neuropal atlas (subsampled) would avoid the need for building custom atlases per strain.

      The putative identities are not used to sample the NeuroPAL data. They were used in the glr-1 multi-cell case to indicate low confidence in manual identification/annotation. For all steps of manual annotation and CRF_ID predictions, we used real neuron labels, and the Greek labels were used for reporting purposes only. It is true that the OpenWorm values (40% of the atlas) would be a handicap for the neuroPAL atlas. This is mainly due to the difficulty of obtaining NeuroPAL data as it requires 3-color fluorescence microscopy and significant time and labor to annotate the large set of neurons. This is one reason to take a complementary approach as we do in this paper.

      Reviewer #1 (Recommendations For The Authors):

      1. Figure 3, there is a confusion in the legend relating to panels c-e (e.g. panel c is neuron ID accuracy but it is described per panel e in the legend.

      We made the necessary changes.

      1. Figure 3, were statistical tests performed for panels d-e? if so, and the outcome was not significant, then it might be good to indicate this in the legend.

      We have added results of statistical tests in the legend as the following sentence: “All distributions in panel d and e had a p-value of less than 0.0001 for one sample t-test against zero.” One sample t-tests were performed because what is plotted already represents each atlas’ differences to the glr-1 25 dataset atlas, we didn’t think the statistical analyses between the other atlases would add significant value.

      1. Figure 4, no asterisks are shown in the figure so it is possible to remove the sentence in the legend describing what the asterisk stands for.

      Thank you. We made the necessary changes.

      Reviewer #2 (Recommendations For The Authors):

      Comparison with deep learning approaches could be more nuanced and structured, the authors (prior) approach extended here combines a specific set of comparative relationship measurements with a general optimization approach for matching based on comparative expectations. Other measurements could be used whether explicit (like neighbor expectations) or learned differences in embeddings. These alternate measurements would both need to be extensively re-calibrated for different sets of cells but might provide significant performance gains. In addition deep learning approaches don't solve the optimization part of the matching problem, so the authors approach seems to bring something strong to the table even if one is committed to learned methods (necessary I suspect for human level performance in denser cell sets than the relatively small number here). A more complete discussion of these themes might better frame the impact of the work and help readers think about the advantages and disadvantages or different methods for their own data.

      We thank the reviewer for bringing up this point. We apologize perhaps not making the point clearer in the original submission. This extension of the original work (Chaudhary et al) is not changing the CRF-based framework, but only augmenting the approach with a better defined set of axes (solely because in multicell and not whole-brain datasets, the sparsity of neurons degrades the axis definition and consequently the neuron ID predictions). We are not fundamentally changing the framework, and therefore all the advantages (over registration-based approaches for example) also apply here. The other purpose of this paper is to demonstrate a couple of use-cases for gene expression analysis, which is common in studies in C. elegans (and other organisms). We hope that by showing a use-case others can see how this approach is useful for their own applications.

      We have clarified these points in the paper (page 18). “The fundamental framework has not been changed from CRF_ID 1.0, and therefore the advantages of CRF_ID outlined in the original work apply for CRF_ID 2.0 as well.”

      The atribution of anatomical differences to strain is interesting, but seems purely speculative, and somewhat unlikely. I would suspect the fundamentally more difficult nature of aligning N items to M>>N items in an atlas accounts for the differences in using the neuroPAL vs custom atlas here. If this is what is meant, it could be stated more clearly.

      It is important to note that the same neuron candidate list (listed in methods) was used for all atlases, so there is no difference among the atlases in terms of the number of cells in the query vs. candidate list. In other words, the same values for M and for N are used regardless of the reference atlas used.

      We have preliminary data indicating differences between the NeuroPAL and custom atlas. For instance, the NeuroPAL atlas scales smaller than the custom glr-1 atlas. Since direct comparisons of the different atlases are beyond the scope of this paper, we will leave the exact comparisons for future work. We suspect that the differences are from a combination of differences in anatomy and imaging conditions. While NeuroPAL atlas may not be exactly fitting for the custom dataset, it can serve as a good starting point for guesses when no custom atlases are available, as we have discussed earlier (response to Public Comments from Reviewer 1 Point 1). As explained earlier, we have added these discussions in the paper (see page 18).

      I was also left wondering if the random removal of landmarks had to be adjusted in this work given it is (potentially) helping cope with not just occasional weak cells but the systematic loss of most of the cells in the atlas. If the parameters of this part of the algorithm don't influence the success for N to M>>N alignment (here when the neuroPAL or OpenWorm atlas is used) this seems interesting in itself and worth discussing. Conversely, if these parameters were opitmized for the matched atlas and used for the others, this would seem to bias performance results.

      We may have failed to make this clear in the main text. As we have stated in our responses in the public review section, we do systematically limit the neuron labels in the candidate list to neurons that are known to be expressed by the promotor. The candidate list, which is kept consistent for all atlases, has more neurons than cells in the query, so it is always an N-to-M matching where M>N. We did not use landmarks, but such usage is possible and will only improve the matching.

      We have attempted to clarify these points in the manuscript. In the results section under “CRF_ID 2.0 for automatic cell annotation in multi-cell images,” we added the following sentence: “Note that a truncated candidate list can be used for subset-specific cell ID if the neuronal expression is known” (page 3). In the methods section, we added the following sentence: “For multi-cell neuron predictions on the glr-1 strain, a truncated atlas containing only the above 37 neurons was used to exclude neuron candidates that are irrelevant for prediction” (Page 20).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors examined the putative functions of hypothalamic groups identifiable through Foxb1 expression, namely the parvofox Foxb1 of the LHA and the PMd Foxb1, with emphasis on innate defensive responses. First, they reported that chemogenetic activation of Foxb1hypothalamic cell groups led to tachypnea. The authors tend to attribute this effect to the activation of hM3Dq expressed in the parvofox Foxb1 but did not rule out the participation of the PMd Foxb1 cell group which may as well have expressed hM3Dq, particularly considering the large volume (200 nl) of the viral construct injected. It is also noteworthy that the activation of the Foxb1hypothalamic cell groups in this experiment did not alter the gross locomotor activity, such as time spent immobile state. Thus, contrasts with the authors finding on the optogenetic activation of the Foxb1hypothalamic fibers projecting to the dorsolateral PAG. In the second experiment, the authors applied optogenetic ChR2-mediated excitation of the Foxb1+ cell bodies' axonal endings in the dlPAG leading to freezing and, in a few cases, bradycardia as well. The effective site to evoke freezing was the rostral PAGdl, and fibers positioned either ventral or caudal to this target had no response. Considering the pattern of Foxb1hypothalamic cell groups projection to the PAG, the fibers projecting to the rostral PAGdl are likely to arise from the PMd Foxb1 cell group, and not from the parvofox Foxb1 of the LHA. Here it is important to consider that optogenetic ChR2-mediated excitation of the axonal endings is likely to have activated the cell bodies originating these fibers, and one cannot ascertain whether the behavioral effects are related to the activation of the terminals in the PAGdl or the cell bodies originating the projection.

      Authors’ reply: We acknowledge and agree about the possibility of backpropagation in ChR2mediated terminal stimulation experiments. We have introduced a paragraph in the discussion section discussing this issue. In short, the observation of an opposing phenotype in ArchT3.0 animals indicates, that the ChR2-mediated phenotype is indeed Foxb1PAG projection specific. This is due to the fact, that the use of light-activated proton pumps for terminal stimulation can not induce backpropagation of an inhibitory effect to the soma. Potential downsides of the use of proton pumps in small compartments as e.g. in the axon are also discussed.

      Moreover, activation of PMd CCK cell group, which consists of around 90% of the PMd cells, evokes escape, and not freezing. According to the present findings, a specific population of PMd Foxb1 cells may be involved in producing freezing. In addition, only a small number of the animals with correct fiber placement presented sudden onset of bradycardia in response to the photostimulation. Considering the authors' findings, the Foxb1+ hypothalamic groups are likely to mediate behavioral responses related to innate defensive responses, where the parvofox Foxb1 of the LHA would be involved in promoting tachypnea and the PMd Foxb1group in mediating freezing and bradycardia. These findings are very interesting, and, at this point, they need to be tested in a scenario of real exposure to a natural predator.

      Authors’ reply: We fully agree with the proposed experiments. Due to the previously mentioned retirement of Prof. Celio and the concomitant expiration of licenses for animal experimentation we are prevented from conducting these experiments on our own. We have integrated a statement in the discussion, regarding these potential future experiments.

      Reviewer #2 (Public Review):

      The authors aimed to examine the role of a group of neurons expressing Foxb1 in behaviors through projections to the dlPAG. Standard chemogenetic activation or inhibition and optogentic terminal activation or inhibition at local PAG were used and results suggested that, while activation led to reduced locomotion and breathing, inhibition led to a small degree of increased locomotion.

      The observed effects on breathing are evident and dramatic. However, this study needs significant improvements in terms of data analysis and presentation and some of studies seem incomplete; and therefore the data may not yet support the conclusion.

      1. Fig.1 has no experimental data and needs to be replaced with detailed pictures from the viral injected mice showing the projections diagrammed.

      Authors’ reply: We believe that this graphic illustration is helpful to the reader to comprehend the spatial relationship between the parvafoxFoxb1 nucleus, the mammillary nuclei, and the PAG. In a previous study we have characterized the projections of the parvafoxFoxb1 nucleus in detail (using the same Foxb1-Cre mouse line as in the present study) and, in this regard, would like to refer Reviewer #2 to this publication (https://onlinelibrary.wiley.com/doi/10.1002/cne.24057).

      1. Fig. 3 needs control pictures and statistical comparison with different conditions in c-Fos. Also expression in other nearby regions needs to be presented to demonstrate the specificity of the expression.

      Authors’ reply: We have modified the original Fig. 3 with more pictures across all three conditions used in the chemogenetic experiments. Since the new figure now takes up a whole page, and because the data in this figure is for validation purpose of the DREADD experiments, we have decided to rather put it into the supplementary files. The figure is now labelled as “Supplementary File S1”. All figure and file numberings throughout the text have been adjusted accordingly.

      1. Fig. 5, a great effort has been made to illustrate the point that CCK and Foxb1 are differentially expressed. Why not just perform a double in situ experiment to directly illustrate the point?

      Authors’ reply: We have addressed this comment in the initial release of the eLife manuscript. In short, we agree that a double ISH experiment would have been an alternative approach, but would like to state that scRNAseq is a well established and valid method for this purpose.

      1. Fig. 7 data on optogenetic stimulation on immobility and breathing, since not all mice showed the same phenotype, what is the criterion for allocating these mice to hit or no hit groups? Given the dramatically reduced breathing and locomotion, what is the temperature response? More data needs to be gathered to support that this is a defense behavior.

      Authors’ reply: The criteria for allocation of animals to the experimental groups is described in section “Optogenetic modulation of Foxb1 terminals in the dlPAG induces immobility” and is based on the stereotaxic coordinates of the tips of the glass fiber implants. We did not perform any experiments, in which we recorded body temperatures or temperature preferences in optogenetic animals. Such experiments were outside the scope of the study. As mentioned in a previous comment above, we have added an additional paragraph to the discussion section regarding future investigations of these hypothalamic Foxb1 neurons during exposure to natural predators. Such experiments would certainly allow more insight into the defensive nature of the described phenotype.

      1. The authors claim to target dlPAG. However, in the picture shown in Fig. 8C, almost all PAG contains ChR2 fibers and it is likely all the fibers will be activated by light. Thus, as presented, the data does not support the claim of the specificity on dlPAG. Also c-Fos data needs to be presented on the degree of activation of downstream PAG neurons after light exposure.

      Authors’ reply: We attach the original image 8c, without arrows and indications, in which the localization of ChR2-positive fibers in the dlPAG is better visible. They are located exactly under the tip of the fiberoptic fiber. We do not know the functional characteristics of the post-synaptic PAG neurons and have not determined experimentally their downstream targets. Investigating the downstream target was outside the scope of the current publication.

      Author response image 1.

      1. Fig. 9 only showed one case. A statistical comparison needs to be presented.

      Authors’ reply: Our cardiovascular experiments are of exploratory and descriptive nature (i.e. pilot experiments). It was a conscious decision to not perform hypothesis tests on these experiments. We did not have enough mice to perform statistical tests with sufficient statistical power. Providing results from hypothesis tests on these data would lead to statistically unjustified conclusions. To clarify this issue, we have added a paragraph to the relevant results section.

      1. Optogentic terminal activation in the PAG will likely elicit back-propagation and subsequent activation of additional downstream brain sites of Foxb1 neurons. More experiments need to be done to assess this and as presented, the data does not support the role of PAG necessarily.

      Authors’ reply: Please see our answer to Reviewer #1 regarding the same issue.

      1. The authors claim negative data from PVH-Cre mice. More data need to be presented to make this case.

      Authors’ reply: We would like to refer to our answer to point 6) that was raised by Reviewer #2

      The conclusion, even as presented, adds to the known evidence of the PAG in the defense behavior.

      Reviewer #1 (Recommendations For The Authors):

      In the pharmacogenetic experiments, the authors need to clarify which Foxb1hypothalamic presented the activation of hM3Dq. It is important to know whether this activation-producing tachypnea was restricted to the parvofoxFoxb1 or also included the PMd Foxb1 group. It would be important to isolate the effect of the pharmacogenetic activation of each one of these Foxb1 hypothalamic cell groups.

      After determining which cell group would be involved in mediating this respiratory effect, it would be nice to discuss the possible pathways involved in this effect.<br /> In the optogenetic experiments, the authors should differentiate between the effects of the PAG projecting fibers from the PMd and those from the parvofox groups. As it stands, it seems that the freezing and bradycardia depend on projection from the PMd Foxb1 group to the rostral PAGdl. However, considering the large volume (200 nl) of the viral construct injected, both groups were likely to express channelrhodopsin, and it would be important if the authors could restrict the viral injections to each one of the Foxb1 hypothalamic cell groups.

      Authors’ reply: We fully agree with the suggestion, but due to the recent retirement of Prof. Celio we unfortunately not allowed to conduct any further animal experiments.

      The authors also reported that photoactivation ventral to the PAGdl, possibly in the PAGl did not yield any clear behavioral response. However, as pointed out in the discussion, a recent publication found that the parvofox Foxb1 projection to the lateral PAG drives social avoidance, and we were wondering whether there was any avoidance behavior during the photoactivation of the PAGl fibers.

      Authors’ reply: We did not conduct any social avoidance experiments ourselves. However, we did perform ultrasonic vocalization experiments (unpublished data) in which we optogenetically stimulated Foxb1+ terminals in the PAG. Due to experimental issues related to the age of the tested mice, we did not obtain conclusive results regarding the ultrasonic vocalizations. By a purely observational account, we did not observe any active avoidance during optogenetic stimulation, but rather a cessation of interaction. We are unable to judge whether this was more pronounced in the PAGl targeted mice or not.

      Another important point is that optogenetic ChR2-mediated excitation of the axonal endings is likely to activate the cell bodies originating these fibers, and one cannot ascertain whether the behavioral effects depend on the activation of the terminals in the PAGdl or the activation of the cell bodies originating these terminals. Note, in the present case, PMd cell bodies may also project elsewhere, such as the cuneiform nucleus, known to mediate freezing responses. To circumvent this problem, during photoactivation of the PAGdl terminals, the authors should inhibit the cell bodies originating these terminals.

      Authors’ reply: We would like to refer to the answer we provided above regarding the issue of backpropagation or ChR2-mediated phenotypes and projection-specificity.

      Another important issue is related to the fact that around 90% of the PMd express CCK (Wang et al., 2021), and previous work showed that activation of these cells yielded escape and not freezing (Wang et al., 2021). Although the authors claim that the single-cell RNA sequencing dataset reveals distinct Foxb1 expression in the PMd, these results derive from tissues collected in the posterior hypothalamus, not exactly restricted to the PMd. Therefore, it would be desirable if the authors could show CCK and Foxb1doulbe labeled PMd sections to evaluate the exact percentage of cells expressing either one of these peptides.

      Authors’ reply: The tissues for the scRNAseq data were obtained from hypothalamic tissues between stereotaxic coordinates of AP-2.54 to AP-3.16 (please see Fig. 1b in Mickelsen et al. 2020) and not purely from the posterior hypothalamic nucleus. These tissues hence include a large proportion of the PMd neurons. We would like to point out that the expression profile of the PMd cluster matches well with the ISH data from the Allen Brain Atlas that we have put together in "Supplementary File S6” (originally “Supplementary File S5”)

      The authors should also explain why only a small number of animals that received PAGdl photoactivation presented bradycardia. Moreover, they should also discuss the possible pathways mediating this effect. Here, it is important to point out that the cuneiform nucleus, as suggested by the authors as one possible way to mediate this effect, promotes sympathetic vasomotor activity (Verbene, 1995).

      We have added the sentence: “The projections of the cuneiform nucleus to the rostral ventrolateral medulla promote sympathetic vasomotor activity (Verberne 1995).” to the Discussion section.

      Reviewer #2 (Recommendations For The Authors):

      In this reviewer's view, this study needs substantial improvement:

      1. The writing is very sloppy and difficult to follow. There is no clear logic flow in the main text and the figures need substantial realigning for panels, additions of labelling etc.

      We have added the sentence.

      1. Fig. 6 the hot plate data is out of place and should be placed in supplementary or removed completely.

      Authors’ reply: We and others have previously shown that the parvalbumin+ population of the Parvafox nucleus is involved in nociceptive behavior. Hence, we believe it is of interest to show, that we do not see the same phenotype with the stimulation of the Foxb+ population of the parvafox nucleus. This data shows that the nociceptive component of the parvafox nucleus is confined to its parvalbumin+ population.

      1. The authors discussed social behavior data in the Discussion, but no such data is presented, which is very confusing.

      Authors’ reply: Indeed we did not perform any experiments to investigate social behavior. However, we address that the observed locomotive phenotype of optogenetic Foxb1+-terminals could have lead to a bias in the interpretation of the social behavior experiments published elsewhere by others.

      1. The authors discussed a great deal on potential differences between parvafox and PMd Foxb1 neurons, however, no clear data was presented to show a functional difference between them, which is also confusing.

      Authors’ reply: Even though investigations on the functional differences of parvafox and PMd Foxb1 neurons would be highly interesting, it was outside the scope of the current study. Due to the recent retirement of Prof. Celio, we are not allowed to perform any additional animal experiments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study that leverages a human-chimpanzee tetraploid iPSC model to test whether cis-regulatory divergence between species tends to be cell type-specific. The evidence supporting the study's primary conclusion--that species differences in gene regulation are enriched in cell type-specific genes and regulatory elements--is compelling, although attention to biases introduced by sequence conservation is merited, and the case that is made for cell type-specific changes reflecting adaptive evolution is incomplete. This work will be of broad interest in evolutionary and functional genomics.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study aims to identify gene expression differences exclusively caused by cis-regulatory genetic changes by utilizing hybrid cell lines derived from human and chimpanzee. While previous attempts have focused on specific tissues, this study expands the comparison to six different tissues to investigate tissue specificity and derive insights into the evolution of gene expression.

      One notable strength of this work lies in the use of composite cell lines, enabling a comparison of gene expression between human and chimpanzee within the same nucleus and shared trans factors environment. However, a potential weakness of the methodology is the use of bulk RNA-seq in diverse tissues, which limits the ability to determine cell-type-specific gene expression and chromatin accessibility regions.

      We agree that profiling single cells could lead to additional exciting discoveries. Although heterogeneity in cell types within samples will indeed reduce our power to detect cell-type-specific divergence, thankfully any heterogeneity will not introduce false positives, since our use of interspecies hybrids controls for differences in cell-type abundance. As a result, we think that the molecular differences we identified in this study represent a subset of the true cell-type specific cis-regulatory differences that would be identified with deep single-cell profiling. We have included a new paragraph in the discussion on future directions, highlighting the utility of single-cell profiling as an exciting future direction (lines 482-490): “In addition to following up on our findings on GAD1 and FABP7, there are other exciting future directions for this work. First, additional bulk assays such as those that measure methylation, chromatin conformation, and translation rate could lead to a better understanding of what molecular features ultimately lead to cell type-specific changes in gene expression. Furthermore, the use of deep single cell profiling of hybrid lines derived from iPSCs from multiple individuals of each species during differentiation could enable the identification of many more highly context-specific changes in gene expression and chromatin accessibility such as the differences in GAD1 we highlighted here. Finally, integration with data from massively parallel reporter assays and deep learning models will help us link specific variants to the molecular differences we identified in this study.”

      Another concern is the use of two replicates derived from the same pair of individuals. While the authors produced cell lines from two pairs of individuals in a previous study (Agloglia et al., 2021), I wonder why only one pair was used in this study. Incorporating interindividual variation would enhance the robustness of the species differences identified here.

      We agree that additional replicates, especially from lines from other individuals, would have improved the robustness of the species differences we identified. In our experience with these hybrid cells (as well as related work from many other labs), inter-species differences typically have much larger magnitudes than intra-species differences, so we expect that the vast majority of differences we identified would be validated with data from additional individuals. Unfortunately, differentiating additional cells and generating these data for this study would be cost-prohibitive. We now mention the use of additional replicates in lines 485-488 of the discussion: “Furthermore, the use of deep single cell profiling of hybrid lines derived from iPSCs from multiple individuals of each species during differentiation could enable the identification of many more highly context-specific changes in gene expression and chromatin accessibility such as the differences in GAD1 we highlighted here.”

      Furthermore, the study offers the opportunity to relate inter-species differences to trends in molecular evolution. The authors discovered that expression variance and haploinsufficiency score do not fully account for the enrichment of divergence in cell-type-specific genes. The reviewer suggests exploring this further by incorporating external datasets that bin genes based on interindividual transcriptomics variation as a measure of extant transcriptomics constraint (e.g., GTEx reanalysis by Garcia-Perez et al., 2023 - PMID: 36777183). Additionally, stratifying sequence conservation on ASCA regions, which exhibit similar enrichment of cell-type-specific features, using the Zoonomia data mentioned also in the text (Andrews et al., 2023 -- PMID: 37104580) could provide valuable insights.

      To address this, we used PhastCons scores computed from a 470-way alignment of mammals as we could not find publicly available PhastCons data from Zoonomia. When stratifying by the median PhastCons score of all sites in a peak, we observe very similar results to those obtained when stratifying by the constraint metric from the gnomAD consortium (see below). The one potential difference is that peaks in the top two bins have slightly weaker enrichment relative to the other bins when using PhastCons, but this is not the case when using gnomAD’s metric. We have elected to include this in the public review but not the manuscript as we are reluctant to add to the complexity of what is already complex analysis.

      Author response image 1.

      Finally, we think that comparisons of the properties of gene expression variance computed from ASE (as done by Starr et al.) and total expression (as done by Garcia-Perez et al.) is a very interesting, potentially complex question that is beyond the scope of this paper but an exciting direction for future work.

      Another potential strength of this study is the identification of specific cases of paired allele-specific expression (ASE) and allele-specific chromatin accessibility (ASCA) with biological significance.

      Prioritizing specific variants remains a challenge, and the authors apply a machine-learning approach to identify potential causative variants that disrupt binding sites in two examples (FABP7 and GAD1 in motor neurons). However, additional work is needed to convincingly demonstrate the functionality of these selected variants. Strengthening this section with additional validation of ASE, ASCA, and the specific putative causal variants identified would enhance the overall robustness of the paper.

      We strongly agree with the reviewer that additional work validating our results would be of considerable interest. We hope to perform follow-up experiments in the future. For now, we have been careful to present these variants only as candidate causal variants.

      Additionally, the authors support the selected ASE-ASCA pairs by examining external datasets of adult brain comparative genomics (Ma et al., 2022) and organoids (Kanton et al., 2019). While these resources are valuable for comparing observed species biases, the analysis is not systematic, even for the two selected genes. For example, it would be beneficial to investigate if FABP7 exhibits species bias in any cell type in Kanton et al.'s organoids or if GAD1 is species-biased in adult primate brains from Ma et al. Comparing these datasets with the present study, along with the Agoglia et al. reference, would provide a more comprehensive perspective.

      We agree with the reviewer’s suggestion that investigating GAD1 and FABP7 expression in other datasets is worthwhile. Unfortunately, the difference in human vs. chimpanzee organoid maturation rates and effects of culture conditions in Kanton et al. makes it unsuitable for plotting the expression of FABP7 as its expression is highly dependent on neuronal maturation. We therefore plotted bulk RNAseq data from multiple cortical regions from Sousa et al. 2017 (see below). This corroborates our claim that FABP7 has human-biased expression in adult humans compared to chimpanzees and rhesus macaques. We also investigated expression of GAD1 in the Ma et al. data as the reviewer suggested.

      Author response image 2.

      While there are differences in GAD1 expression between adult humans and chimpanzees, they are unlikely to be linked to the HAR we highlight as it is likely a transiently active cis-regulatory element (see below). In addition, some cell types seem to have chimpanzee-derived changes in GAD1 expression (e.g. SST positive neurons) whereas others seem to have human-derived changes in GAD1 expression (e.g. LAMP5 positive neurons).

      Author response image 3.

      While these are potentially interesting observations, we think that their inclusion in the manuscript might distract from our emphasis on the cell type-specific and developmental stage-specific of the changes in FABP7 and GAD1 expression we observe so we have not included them in the manuscript.

      The use of the term "human-derived" in ASE and ASCA should be avoided since there is no outgroup in the analysis to provide a reference for the observed changes.

      We agree with the reviewer that the term human-derived should be used with care and have changed the phrasing of line 230 to “human-chimpanzee differences in expression”. With regard to FABP7 we think that our analysis of the Ma et al. data—which includes data from rhesus macaques as an outgroup—justifies our use of “human-derived” in lines 360 and 457. As chimpanzee and macaque expression of FABP7 are similar but human expression is quite different, the most parsimonious explanation for our observations is that FABP7 upregulation occurred in the human lineage.

      Finally, throughout the paper, the authors refer to "hybrid cell lines." It has been suggested to use the term "composite cell lines" instead to address potential societal concerns associated with the term "hybrid," which some may associate with reproductive relationships (Pavlovic et al., 2022 -- PMID: 35082442). It would be interesting to know the authors' perspective on these concerns and recommendations presented in Pavlovic et al., given their position as pioneers in this field.

      We appreciate this question. Whether to refer to our fused cells as “hybrids” or not was indeed a question we considered at great length, starting from the very beginning of this project in 2015. From consultations with multiple bioethicists-- both formal and informal-- we have long been aware of the possibility of misunderstanding based on the word “hybrid”. However, we felt this possibility was outweighed by the long and well-established history of other scientists referring to interspecies fused cells as hybrids. This convention-- which is based on hundreds of papers about heterokaryons, somatic cell hybrids, and radiation hybrids-- goes back over 50 years (e.g. Bolund et al, Exp Cell Res 1969). Soon after the establishment of this nomenclature, cell fusion became widespread and ever since then it has become commonplace to generate interspecies hybrid cells from animals, plants and fungi.

      It is also important to note that in over two years since we published the first two papers on humanchimpanzee fused cells, we have been unable to find any misunderstanding of our use of the term “hybrid”. We have searched blogs, media articles, and social media, all with no evidence of misunderstanding. Therefore, in the current manuscript, rather than creating confusion by renaming a well-established approach, we have opted to clearly and prominently define hybrid cells: in the abstract of our paper we introduce the hybrid cells as “the product of fusing induced pluripotent stem (iPS) cells of each species in vitro.”

      Reviewer #2 (Public Review):

      In this paper, Wang and colleagues build on previous technical and analytical achievements in establishing tetraploid human-chimpanzee hybrid iPSCs to investigate the cell type-specificity of allelespecific expression and allele-specific chromatin accessibility across six differentiated cell types (here, "allele-specific" indicates species differences with a cis-regulatory basis). The combined body of work is remarkable in its creativity and ambition and has real potential for overcoming major challenges in understanding the evolutionary genetics of between-species differences. The present paper contributes to these efforts by showing how differentiated cells can be used to test a long-standing hypothesis in evolutionary genetics: that cis-regulatory changes may be particularly important in divergence because of their potential for modularity.

      In my view, the paper succeeds in making this case: allele (species)-specific expression (ASE) and allelespecific chromatin accessibility (ASCA) are enriched in genes asymmetrically expressed in one cell type, and many cases of ASE/ASCA are cell type-specific. The authors do an excellent job showing that these results are robust across a set of possible analysis decisions. It is somewhat less clear whether these enrichments are primarily a product of relaxed constraint on cell type-specific genes or primarily result from positive selection in the human or chimp lineage. While the authors attempt to control for constraint using several variables (variance in ASE in humans and the sequence-based probability of haploinsufficiency score, pHI), these are imperfect proxies for constraint. For the pHI scores, enrichments for ASE also appear to be strongest in the least constrained genes. Overall, the relative role of relaxation of constraint versus positive selection is unresolved, although the manuscript's language leans in favor of an important role for selection.

      We agree with the reviewer and apologize for the wording that indeed focused more on positive selection than relaxed constraint. We have added language clarifying that our stance is that our analyses suggest some role for positive selection, but that we do not claim that positive selection plays a larger role than reduced constraint (lines 432-437): “Overall, this suggests that broad changes in expression in cell type-specifically expressed genes may be an important substrate for evolution but it remains unclear whether positive selection or lower constraint plays a larger role in driving the faster evolution of more cell type-specifically expressed genes. Future work will be required to more precisely quantify the relative roles of positive selection and evolutionary constraint in driving changes in gene expression.”

      The remainder of the manuscript draws on the cell type-specific ASE/ASCA data to nominate candidate genes and pathways that may have been important in differentiating humans and chimpanzees. Several approaches are used here, including comparing human-chimp ASE to the distribution of ASE observed in humans and investigating biases in the direction of ASE for genes in the same pathway. The authors also identify interesting candidate genes based on their role in development or their proximity to human accelerated regions (where many changes have arisen on the human lineage in otherwise deeply conserved sequence) and use a deep neural network to identify sequence changes that might be causally responsible for ASE/ASCA. These analyses have value and highlight potential strategies for using ASE/ASCA and hybrid cell line data as a hypothesis-generating tool. Of course, the functional follow-up that experimentally tested these hypotheses or linked sequence/expression changes in the candidate pathways to organismal phenotype would have strengthened the paper further- but this is a lot to ask in an already technically and analytically challenging piece of work.

      We thank the reviewer for the kind words and strongly agree that follow-up experiments and orthogonal analyses will be key in validating our results and establishing links to human-specific phenotypes.

      As a minor critique, the present paper is very closely integrated with other manuscripts that have used the hybrid human-chimp cell lines for biological insight or methods development. Although its contributions make it a strong stand-alone contribution, some aspects of the methods are not described in sufficient detail for readers to understand (even on a general conceptual level) without referencing that work, which may somewhat limit reader understanding.

      We agree with the points the reviewer raises regarding the clarity of our methods. We have amended several sections to provide more conceptual information while pointing the reader to other publications for the technical details. For convenience, we include the text here as well as in the new draft.

      Lines 207-214 now provide more intuition for the method used to detect lineage-specific selection: “Next, we sought to use our RNA-seq data to identify instances of lineage-specific selection. In the absence of positive selection, one would expect that an approximately equal number of genes in a pathway would have human-biased vs. chimpanzee-biased ASE. Significant deviation from this expectation (as determined by the binomial test) rejects the null hypothesis of neutral evolution, instead providing evidence of lineage-specific selection on this pathway. Using our previously published modification of this test that incorporates a tissue-specific measure of constraint on gene expression, we detected several signals of lineage-specific selection, some of which were cell type-specific (Starr et al., 2023, Additional file 2).” This is also reflected in the Methods in lines 729-731: “Positive selection on a gene set is only inferred if there is statistically significant human- or chimpanzee-biased ASE in that gene set (using an FDR-corrected p-value from the binomial test).”

      Reviewer #3 (Public Review):

      The authors utilize chimpanzee-human hybrid cell lines to assess cis-regulatory evolution. These hybrid cell lines offer a well-controlled environment, enabling clear differentiation between cis-regulatory effects and environmental or other trans effects.

      In their research, Wang et al. expand the range of chimpanzee-human hybrid cell lines to encompass six new developmental cell types derived from all three germ layers. This expansion allows them to discern cell type-specific cis-regulatory changes between species from more pleiotropic ones. Although the study investigates only two iPSC clones, the RNA- and ATAC-seq data produced for this paper is a valuable resource.

      The authors begin their analysis by examining the relationship between allele-specific expression (ASE) as a measure of species divergence and cell type specificity. They find that cell-type-specific genes exhibit more divergent expression. By integrating this data with measures of constraint within human populations, the authors conclude that the increased divergence of tissue-specific genes is, at least in part, attributable to positive selection. A similar pattern emerges when assessing allele-specific chromatin accessibility (ASCA) as a measure of divergence of cis-regulatory elements (CREs) in the same cell lines.

      By correlating these two measures, the authors identify 95 CRE-gene pairs where tissue-specific ASE aligns with tissue-specific ASCA. Among these pairs, the authors select two genes of interest for further investigation. Notably, the authors employ an intriguing machine-learning approach in which they compare the inferred chromatin state of the human sequence with that of the chimpanzee sequence to pinpoint putatively causal variants.

      Overall, this study delves into the examination of gene expression and chromatin accessibility within hybrid cell lines, showcasing how this data can be leveraged to identify potential causal sequence differences underlying between-species expression changes.

      We appreciate this assessment.

      I have three major concerns regarding this study:

      1. The only evidence that the cells are indeed differentiated in the right direction is the expression of one prominent marker gene per cell type. Especially for the comparison of conservation between the differentiated cell types, it would be beneficial to describe the cell type diversity and the differentiation success in more detail.

      We appreciate this assessment. We agree that evidence beyond a single marker gene is necessary to demonstrate that the differentiations were successful and that a discussion of the limitations of these differentiations in the manuscript is worthwhile. We included figures showing additional marker genes and a thorough discussion of the differentiations in the supplement. For convenience, we have copied the supplemental figure and text here:

      “Before continuing with the analysis, we tested whether the differentiations were successful and contained primarily our target cell types. The very low expression of NANOG, a marker for pluripotency, across all differentiations indicates that the samples contain very few iPSCs (Agoglia et al., 2021). For cardiomyocytes (CM), NKX2-5, MYBPC3, and TNNT2 definitively distinguish CM from other heart cell types and their high expression indicates successful differentiations (Burridge et al., 2014). For motor neurons, the high expression of ELAVL2, a pan-neuronal marker, indicates a high abundance of neurons in the sample (Mickelsen et al., 2019). The expression of ISL1 and OLIG2 further demonstrates that these are motor neurons and not other types of neurons (Maury et al., 2015). For retinal pigment epithelium (RPE), the combined expression of MITF, PAX6, and TYRP1 provides strong evidence that the differentiations were successful in producing RPE cells (Sharma et al., 2019). For skeletal muscle, the very high expression of MYL1, MYLPF, and MYOG indicates that these samples contain a high proportion of skeletal muscle cells (Chal et al., 2016). In general, all these populations of cells contain some proportion of progenitors as there is detectable expression of MKI67 in all samples.

      The low expression of ALB (a marker for mature hepatocytes) and the high expression of TTR and GPC3 (markers for hepatocyte progenitors) combined with the high expression of HNF1B indicate that the bulk of the cells in the HP samples are hepatocyte progenitors rather than mature hepatocytes or endoderm cells, although there are likely some endoderm cells and immature hepatocytes in the sample (Hay et al., 2008; Mallanna & Duncan, 2013). Similarly, the combined expression of PDX1 and NKX6-1 and the low expression of NEUROG3 (a marker of endocrine progenitors which differentiate from pancreatic progenitors) in the PP samples indicates that these primarily contain pancreatic progenitors but likely contain some endocrine progenitors and endoderm cells (Cogger et al., 2017; Korytnikov & Nostro, 2016).

      Notably, HP and PP are closely related cell types that are derived from the same lineage. Indeed, heterogeneous multipotent progenitors can contribute to both the adult liver and adult pancreas in mice (Willnow et al., 2021). Progenitors that express PDX1 (often used as a marker for the pancreatic lineage) can differentiate into hepatocytes (Willnow et al., 2021). As a result, some overlap in the transcriptomic signature of both cell types is expected and we cannot rule out that the HP samples contain cells that could differentiate into pancreatic cells or that the PP samples contain cells that could differentiate into hepatocytes. However, the expression of NKX6-1 and GP2, markers for pancreatic progenitors, in the PP samples but not the HP samples indicates that these two populations of cells are distinct. Overall, the similarity of PP and HP likely explains the lower number of cell type-specific genes and genes showing cell type-specific ASE for these cell types. This similarity does not alter the conclusions presented in the main text.”

      Author response image 4.

      Author response image 5.

      Marker gene expression in different cell types. In order, the panels show: a marker for pluripotency, a marker gene for dividing cells, marker genes for cardiomyocytes, marker genes for hepatocytes and hepatocyte progenitors, marker genes for motor neurons, marker genes for pancreatic progenitors and more mature pancreatic cell types, marker genes for retinal pigment epithelial cells, and marker genes for skeletal myocytes. Hepatocyte progenitors and pancreatic progenitors generally show similar gene expression profiles. TPM: transcript per million.

      1. Check for a potential confounding effect of sequence similarity on the power to detect ASE or ASCA.

      We agree that checking for confounding by power to detect ASE or ASCA would increase confidence in our results. We have added supplementary figures 29-33 to show the results as well as a discussion of these figures in the text (lines 318-326):

      “Finally, it is possible that CREs and genes that are less conserved will have more SNPs, and therefore more power to call ASCA and ASE, leading to systematically biased estimates. There is a weak positive correlation between the number of SNPs and the -log10(FDR) for ASE and a weak negative or no correlation for ASCA (Supp Fig. 29). Similarly, we observe a weak relationship between the number of SNPs in CREs or genes and absolute log fold-change estimates (Supp Fig. 30). Although the relationship between the number of SNPs and ASE/ASCA is weak, we confirmed that cell type-specific genes and peaks are still strongly enriched for ASE and ASCA when stratifying by number of SNPs (Supp Fig. 31-32). Overall, our analysis suggests that the result that more cell type-specific genes and CREs are more evolutionarily diverged is robust to a variety of possible confounders.”

      Author response image 6.

      Relationship between number of SNPs and -log10(FDR) in a) ASE and -log10(pvalue) b) ASCA. These scatter plots show the relationship between the number of SNPs in a gene or peak and the -log10(FDR) for ASE or ASCA. Genes with significant ASE (FDR < 0.05) and peaks with significant ASCA (binomial p-value < 0.05) were annotated as blue dots, and all other genes and peaks were annotated as gray dots. All genes in each cell type in RNA-seq are shown. For clarity, the few outlier peaks with more than 200 SNPs are excluded from these plots.

      Author response image 7.

      Relationship between number of SNPs and absolute log2 fold-change in a) ASE and b) ASCA. These scatter plots show the relationship between the number of SNPs in a gene or peak and the estimated absolute log2 fold-change for ASE or ASCA. Genes with significant ASE (FDR < 0.05) and peaks with significant ASCA (binomial p-value < 0.05) were annotated as blue dots, and all other genes and peaks were annotated as gray dots. All genes in each cell type in RNA-seq are shown. For clarity, the few outlier peaks with more than 200 SNPs are excluded from these plots.

      Author response image 8.

      Cell type-specifically expressed genes are enriched for genes with ASE when stratifying by the number of SNPs per gene. a) Results when SKM is included. Genes were put into five bins with an equal number of genes in each bin. Genes with the fewest SNPs are in the 0-20% bin and genes with the most SNPs are in the 80-100% bin. Significance (using the Wald test) is indicated by asterisks where *** indicates p < 0.005, ** indicates p < 0.01, and * indicates p < 0.05. b) The same as in (a) but excluding SKM.

      Author response image 9.

      Cell type-specific peaks are enriched for ASCA when stratifying by the number of SNPs per peak. a) Peaks with an absolute log2 fold-change greater than or equal to 0.5 were called as having ASCA. Peaks were put into five bins with an equal number of peaks in each bin. Peaks with the fewest SNPs are in the 0-20% bin and genes with the most SNPs are in the 80-100% bin. Significance (using the Wald test) is indicated by asterisks where *** indicates p < 0.005, ** indicates p < 0.01, and * indicates p < 0.05. b) The same as in (a) but peaks with a binomial p-value less than or equal to 0.05 were called as having ASCA.

      1. In the last part the authors showcase 2 examples for which the log2 fold changes in chromatin state scores as inferred by the machine learning model Sei are used. This is an interesting and creative approach, however, more sanity checks on this application are necessary.

      We agree with the reviewer about the importance of sanity checks and apologize for omitting these from the manuscript. Below we highlight several such checks from previous publications:

      In the original Sei paper (Chen et al. 2022), the authors included several tests of their model’s ability to predict the effects on individual genetic variants. Using eQTL data from GTEx, they found that variants predicted to increase enhancer activity were more likely to be up-regulating eQTLs, and those predicted to increase polycomb repression had the expected repressive effect. These relationships became stronger when restricting the analysis only to fine-mapped eQTLs with >95% posterior probabilities of causality. Chen et al. also found that previously known disease-causing noncoding variants from the Human Gene Mutation Database were far more likely to reduce predicted enhancer/promoter activity than matched variants not linked to any disease.

      In addition, we note that a similar approach to ours was recently used to analyze all HARs and included considerable efforts to validate the utility of the Sei predictions in identifying causal variants (Whalen et al. 2023 in Neuron). For example, Whalen et al. found that the Sei output correlated with the effects of genetic variants on expression in a massively parallel reporter assay. They also found that the effect sizes predicted by Sei were much higher for variants in HARs than polymorphic variants in the human population, which is consistent with the idea that variants in HARs lie in highly conserved bases that are more likely to disrupt cis-regulatory elements. Finally, Whalen et al. found that effects on chromatin state predicted by Sei were generally highly correlated across tissues, supporting our approach that leverages all Sei outputs regardless of which cell type or tissue they correspond to. Overall, we think that Sei is a potentially powerful way to prioritize causal variants and that improved machine learning models trained on more extensive and context-specific data will be even more powerful.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The study isolated extracellular vesicles (EV) from healthy controls (HCs) and Parkinson patients (PwP), using plasma from the venous blood of non-fasting people. Such EVs were characterized and validated by the presence of markers, their size, and their morphology. The main aim of the manuscript is to correlate the presence of synaptic proteins, namely SNAP-25, GAP-43, and SYNAPTOTAGMIN-1, normalized with HSP70, with the clinical progression of PwP. Changes in synaptic proteins have been documented in the CSF of Alzheimer's and Parkinson's patients. The demographics of participants are adequately presented.

      • One important limiting, as well as puzzling aspect, is the fact that authors did not find differences between groups at the beginning of the study nor after one year, after age and sex adjustment.

      Response: Thanks for your comments. We acknowledge your observation that the absence of a discernible difference in plasma EV synaptic protein levels between the PD and control subjects constitutes a significant limitation of our study. This outcome could be attributed to the fact that the controls were recruited from the neurology outpatient clinic, representing a group that could be considered "sub-healthy." Moreover, these individuals are not exempt from aging-related neurodegenerative processes. Considering that our PD subjects are in the early stages of the disease (with a mean disease duration of less than 3 years) and that synaptic dysfunction is a broader indicator rather than specific to PD, these factors could collectively contribute to the lack of distinction between the PD and control groups.

      However, our primary intention was also to explore the potential of plasma EV synaptic proteins as predictive markers for disease progression in PD. In this regard, we have identified their applicability within the current PD cohort. We are committed to conducting further follow-up with these study subjects over an extended duration to delve deeper into these findings.

      We revised the following statement in the discussion part to address this issue as following “Additionally, synaptic dysfunction is a frequently observed phenomenon in several neurological diseases, and it is not exclusive to PD. Consequently, the HC group in our current study may have included individuals with coexisting neurological conditions, potentially explaining the lack of a significant difference between the PD group and the HCs. However, this approach also illuminates the significance of synaptic dysfunction in the advancement of PD. This insight can be invaluable for monitoring disease progression, particularly in the context of clinical trials focused on disease modification.”

      • Tables in general are hard to follow. Specifically, Table 2 does not convey a clear message nor in the text of the Table itself, and the per 100% of change needs to be explained in the corresponding legend.

      Response: Thanks for your comment. In Table 2, our aim was to demonstrate the association between the change of plasma EV synaptic proteins with the change of clinical severity, and presented as coefficient (p value). We apologize for any prior ambiguity in the main text's description of these results and have since made revisions to enhance clarity.

      Regarding the "per 100% change," this is due to the quantification of plasma EV synaptic proteins being based on a semi-quantitative Western blot method. Each measurement was normalized by the average baseline plasma synaptic protein levels of healthy controls (HCs). The term "per 100% change" denotes the increase or decrease in plasma EV synaptic protein abundance relative to the average baseline levels observed in healthy controls. We apologize for any confusion caused and removed this term. In addition, we rephrased the statement to ensure better understanding and readability in the Table legend of revised manuscript as following “The association between the change of plasma EV synaptic proteins abundance (between baseline and follow-up) with the change of clinical severity in motor and cognitive domains (between baseline and follow-up) in people with Parkinson’s disease. A generalized linear model was employed and the data was presented as coefficient (p value).”

      • It is only when PwP were classified as a first quartile that a significantly greater deterioration was found. However, in the case of tremor, the top 25% had values going from 0.46-0.47 to 0.32-0.35, whereas the lower three quarters went from 0.33-0.34 to 0.27-0.28 depending on the protein analyzed. This needs to be clarified in the text.

      Response: Thanks for your comments. As per the unified Parkinson's disease rating score (UPDRS), a higher score indicates greater severity of symptoms. Regarding tremor, we observed a general trend of improvement in both groups. PwP with elevated baseline plasma EV proteins had a trendy of worse tremor score at baseline, and the improvement was significantly better than the rest of PwP. This improvement seems to contradict the progressive nature of PD, and one possible explanation could be the alleviation of symptoms due to medication usage. The assessment of motor symptoms took place within the hospital setting, where we refrained from requesting patients to withhold their anti-PD medications due to concerns about safety issues such as falls. Consequently, certain motor symptoms might have been effectively controlled by the anti-PD medication. Traditionally, symptoms like tremor and rigidity (as reflected by the akinetic rigidity score) respond well to medications, while postural instability and gait disturbance (PIGD) are less responsive. In our cohort, we noted an improvement in tremor scores and stability in akinetic rigidity (AR) scores. Conversely, PD patients with higher baseline plasma EV synaptic protein levels exhibited notable progression in PIGD scores. These findings have been documented in the results section and discussed comprehensively within the revised manuscript as following “On the other hand, the evaluation of motor symptoms occurred in a hospital setting where we did not ask patients to stop taking their anti- PD medications due to safety concerns like the risk of falls. As a result, specific motor symptoms, particularly tremor and AR, which are more sensitive to medication compared to PIGD, may have been effectively managed by the anti-PD medications. This could potentially explain the improvement in tremor observed between the baseline and one-year follow-up, especially among PwP with elevated baseline plasma EV synaptic proteins.”

      • Table 3 is hard to read and some of the values seem repetitive, especially for tremor, AR, and PIGD. It looks as if Figure 2 represents the same information as Table 3.

      Response: Thanks for your information. We have ensured the accuracy of the results presented in Table 2. While some of the entries may appear similar, they do indeed possess distinct differences.

      To enhance readability, we streamlined the information in Table 3 by removing the p-values from the intra-group comparisons between baseline and the 1-year follow-up within each domain. We retained the original p-values for trend related to the inter-group comparisons for changes. Detailed information has been relocated to the supplementary section of the revised manuscript. In Figure 2, we illustrated the relationship between baseline plasma extracellular vesicle (EV) synaptic protein levels and the clinical assessment parameters during follow-up in patients with Parkinson's disease (PwP). This portrayal is distinct from the information depicted in Table 3.

      If you had concerns about the resemblance between Table 3 and Figure 3, please note that the values in Table 3 represent raw scores, while the values in Figure 3, namely the estimated marginal means, are the "adjusted" scores for UPDRS-II and PIGD at baseline and follow-up. These adjustments encompass age, sex, and disease duration. We sincerely apologize for any lack of clarity in our previous description and have since revised it accordingly.

      • The text and figure legends are not helpful in guiding the reader to understand the presented information.

      Response: Thanks for your comments and we apologized for the unclear statement. We revised the figure legend and the main text for better understanding of the readers.

      Reviewer #2 (Public Review):

      Hong and collaborators investigated variations in the amount of synaptic proteins in plasma extracellular vesicles (EV) in Parkinson's Disease (PD) patients on one-year follow-up. Their findings suggest that plasma EV synaptic proteins may be used as clinical biomarkers of PD progression.

      • It is a preliminary study using semi-quantitative analysis of synaptic proteins.

      Response: Thanks for your comments. The present study represents the initial phase of our investigation into the role of plasma EV synaptic proteins within our PD cohort. Our findings have revealed the potential predictive significance of these synaptic proteins in relation to PD progression. We are committed to conducting further follow-up with these study subjects over an extended period.

      Furthermore, it's important to acknowledge that the semi-quantitative approach employed to assess protein abundance was a limitation of this study. This limitation stems from the low concentration of plasma EV synaptic proteins, which restricts the feasibility of utilizing techniques such as ELISA or other quantitative methods for protein assessment. We have duly acknowledged this limitation within the scope of the present study as following “Semiquantitative assessment of plasma EV synaptic protein (SNAP-25, GAP-43, and synaptotagmin-1) levels was performed using western blot analysis. The lack of absolute values limits further clinical application.”

      Moving forward, we intend to adopt alternative EV isolation methods that enable the extraction of a larger abundance of plasma EV proteins, facilitating more accurate quantitative assessments. In addition, a longer longitudinal follow-up is warranted to clearly assess the prognostic efficacy of plasma EV synaptic proteins in PwP, which we had mentioned in the manuscript.

      • The authors have a cohort of PD patients with clinical examination and a know-how on EV purification. Regarding this latter part, they may improve their description of EV purification. EV may be broken into smaller size EV after freezing. Does it explain the relatively small size in their EV preparation? Do the authors refer to the MISEV guidelines for EV purity?

      Response: Thanks for your comments. In the previous manuscript, we provided a relatively detailed account of the procedures related to EV isolation and validation (https://doi.org/10.1096/fj.202100787R). In the revised manuscript, we added some information about the principle of the EV isolation kit, and the validation antibody as following “Plasma EVs were isolated from 1 mL of plasma by exoEasy Maxi Kit (Qiagen, Valencia, CA, USA), a membrane-based affinity binding step to isolate exosomes and other EVs without relying on a particular epitope, in accordance with the manufacturer’s instructions and storaged in the −80。C freezer. The isolated plasma EVs were then eluted and stored. Usually, 400 μL of eluate is obtained per mL of plasma. The isolated plasma EVs were validated according to the International Society of Extracellular Vesicles guidelines, which include1.markers, including the presence of CD63 (ab59479, Abcam, Cambridge, UK), CD9(ab92726, Abcam, Cambridge, UK), tumor susceptibility gene 101 protein (GTX118736, GeneTex, CA, USA) and negative of cytochrome c (ab110325; Abcam, Cambridge, UK) 2. Physical characterization through the nanoparticle tracking analysis, which demonstrated the majority of the size of EV are mainly within 50-100nm 3. The morphology from the electron microscopy analysis. The validation had been described previously [29-31]. “

      It's important to note that our primary focus was on exosomes, the smallest subtype of EVs. Through nanoparticle tracking analysis, we observed that the majority of isolated EVs fell within the diameter range of 50-150nm, exhibiting significant surface marker (i.e. CD63 and CD9) expression. Moreover, electron microscopy confirmed their vesicular morphology. These meticulously validated EVs were promptly analysed post-isolation.

      However, we acknowledge that the plasma obtained from study participants might have undergone freezing prior to EV isolation. This freezing process has the potential to diminish the yield rate of EVs and result in some degree of fragmentation. We have duly included this issue as a limitation in our revised manuscript as following “The final technical issue in the present study was the relatively small size of the isolated EVs. Despite the primary focus on isolating exosomes, which are the smallest type of EVs, it's important to consider that the presence of small-sized EVs could potentially be attributed to EV fragmentation that occurs during the freezing and thawing processes.”

      • Regarding synaptic protein quantification, the choice of western blotting may not be the best one. ELISA and other multiplex arrays are available. How the authors do justify their choice?

      Response: Thanks for your comments. We appreciate your input regarding the semi-quantitative western blot analysis not being the most optimal approach. Owing to the limited quantity of isolated plasma EVs and the significant protein abundance of synaptic proteins within these EVs, we did explore the use of an ELISA assay. However, it's worth noting that for a specific subset of the samples, the readout obtained was lower than the lower limit of detection of the ELISA kit. In response, we have incorporated this point as limitation within the discussion section of the revised manuscript as following “Semiquantitative assessment of plasma EV synaptic protein (SNAP-25, GAP-43, and synaptotagmin-1) levels was performed using western blot analysis. The lack of absolute values, i.e. from the results of enzyme-linked immunosorbent assay, limits further clinical application.”

      • Do the authors try to sort plasma EV by membrane-associated neuronal EV markers using either vesicle sorting or immunoprecipitation?

      Response: Thanks for your comments. The current study did not specifically isolate neuron-derived extracellular vesicles (EVs), potentially introducing some bias to the results. However, it's important to note that synaptic proteins, such as SNAP-25, exhibit a high degree of neuron-specific expression, with a predominant presence in the brain (as indicated by https://www.proteinatlas.org/ENSG00000132639-SNAP25/tissue). Given this context, the limitation of not analyzing neuron-derived EVs could be mitigated to some extent. In response, we have incorporated this point as limitation within the discussion section of the revised manuscript as following “Furthermore, this study evaluated the overall plasma EVs rather than specifically focusing on neuron-derived exosomes, potentially introducing a bias towards somatic-origin EVs. Nonetheless, it is worth noting that synaptic proteins primarily originate from neurons. Even when considering neuron-derived exosomes, it's important to recognize that they are not exclusively derived from the brain, which can lead to contamination from the peripheral nervous system.”

      • Many technical aspects may be improved. Such technical questions weakened the authors' conclusions.

      Response: Thanks for your comments. We recognize that the aforementioned issues represent limitations of our current study. In response, we have incorporated these points as limitations, including the semi-quantitative assessments, the isolation of total but not neuron-derived exosomes in the plasma, and the short follow-up time within the discussion section of the revised manuscript.

      • The discussion is pretty long to justify the data. It may be shortened by adding some information in the introduction.

      Response: Thanks for your comments. We have repositioned a statement from the second paragraph of the discussion to the introduction. This adjustment serves to enrich the background understanding of the link between synaptic dysfunction and neurodegenerative diseases.

    1. Author Response

      Reviewer #1 (Public Review)

      The manuscript by Singh et al proposes a new theoretical model for the phenomenon of planar cell polarity (PCP). The new model is simulating the emergence of the subcellular polarity of the Fat-Ds pathway, based on the interactions of the protocadherins Fat and Ds at the boundary between cells and in response to external gradients. Several mathematical models for PCP have been previously developed focusing on different aspects of PCP, including non-autonomy domineering (Amonlirdviman et al.), the effect of stochasticity on polarity (Burak et al.), gradient sensing (Mani et al), formation of molecular bridges (Fisher et al.) to name a few. The current modeling approach suggests a new model, based on a relatively simple set of equations for membrane Fat and Ds and their interactions, both in 1D (line of cells) and in 2D (hexagonal array). The equations are relatively simple on one hand, allowing performing tractable computational analysis as well as analytical approximations, while on the other hand allowing tracking membrane protein levels, which is what is measured experimentally. It has been previously shown that achieving polarity requires local feedback that amplify complexes in one orientation at the expense of complexes in the opposite orientation (e.g. Mani et al.). Interestingly, the current manuscript shows that a simple assumption, that Fat-DS complexes are stabilized when bound is sufficient to induce PCP when concentrations are high enough. The authors use the model to show how it captures several experimental observations, as well as to analyze the sensitivity to noise, the response to gradients, and the response to local perturbations (mutant clones). The manuscript is clear and the analysis is mostly coherent and sensible (although some parts need to be clarified, see below). The main issue I have with the manuscript is that it mostly describes how it captures different features that were mostly explained in previous models. I do think the authors should do more with their model to explain features that were not explained by other models, and/or generate non-trivial predictions that can be tested experimentally.

      We thank the reviewer for the positive feedback and valuable comments We have comprehensively modified the manuscript by including new results and detailing the specific model prediction and their potential experimental tests to address the concerns.

      Reviewer #2 (Public Review):

      The setting of planar cell polarity in epithelial tissues involves a complex interplay of chemical interactions. While local interactions can spontaneously give rise to cell polarity, planar cell polarity also involves tissue scale gradients whose effects are not clear. To understand their role, the authors built a minimal mechanistic model in considering two atypical cadherins, Fat (Ft) and Dachsous (Ds) which can associate at cell-cell interfaces to form hetero-dimers in which monomers belong to adjacent cells. This association can be seen as a local interaction between cells and is also sensitive to overall concentration gradients. From their model which appears to capture diverse experimental observations, the authors conclude that tissue-scale gradients provide to planar cell polarity a directional cue and some robustness to cellular stochasticity. While this model comes after similar works reaching similar predictions, the quality of this model is in its simplicity, its convenience for experimental testing, and the diversity of experimental observations it recapitulates.

      A strength of this work is to recapitulate many experimental observations made on planar cell polarity. It, for example, seems to capture the response of tissues to perturbations such as local downregulation of some important proteins, and the polarity patterns observed in the presence of noise in synthesis or cell-to-cell heterogeneity. It also gives a mechanistic description of planar cell polarity, making its experimental interpretation simple. Finally, the simplicity of the model facilitates its exploration and makes it easily testable because of the reduced amount of free model parameters.

      A weakness of this work is that it comes after several models with similar hypotheses and similar predictions.

      Another weakness is that some conclusions of this work rely on visual appreciation rather than quantification. This is particularly true for what concerns 2D patterns. An argument of the authors is for example that their model reproduces a variety of known spatial patterns, but the comparison with experiments is only visual and would be more convincing in being more quantitative.

      We are grateful to the reviewer for a critical evaluation of the manuscript and for giving important suggestions. We have incorporated all the comments and revised the manuscript accordingly by including quantitative analysis of all the results presented.

      Reviewer #3 (Public Review):

      Using theory, the authors study mechanisms for establishing planar cell polarity (PCP) through local and global modules. These modules refer to the interaction between neighbouring cells and tissue-wide gradients, respectively. Whereas local interactions alone can lead to tissue-wide alignment PCP, a global gradient can set the direction of PCP and maintain the pattern in presence of noise. In contrast, the authors argue that a global gradient can only generate PCP to an extent that is proportional to the gradient magnitude.

      The authors formulate a discrete model in one and two spatial dimensions that describe the assembly dynamics of PCP proteins on membranes. The number of proteins per cell remains constant. Additive noise is introduced to account for stochasticity in the attachment/detachment kinetics of proteins. Furthermore, ’quenched’ noise is introduced to account for variations of protein numbers between cells. The authors perform simulations of the stochastic discrete model in various situations. In addition, they derive a continuum description to perform some analytical computations.

      The strength of this analysis relies clearly on showing that simple dynamics can lead to tissue-wide PCP even in absence of a gradient in protein expression. A number of phenomena observed in tissues are qualitatively reproduced. In two spatial dimensions, they find swirling patterns that resemble patterns found in tissues when a global gradient is absent. The model also captures qualitative effects due to the down-regulation of one of the PCP proteins in a certain region of the tissue.

      The main weak point is that, from a physical point of view, the findings are not particularly surprising. Furthermore, some assumptions underlying the model, need some more justification. This holds notably for the question, of why additive noise is appropriate to account for the effect of stochasticity in the attachment-detachment dynamics of the proteins. Finally, the authors consider a situation that they consider to be one of the most interesting features of PCP, namely, the formation of PCP in the presence of a region with a down-regulated PCP protein and in presence of a gradient. Unfortunately, the effect is not very clear and the data provided remains limited.

      We thank the reviewer for the valuable comments are critique of the work. We have considered all the concerns and revised the manuscript comprehensively. In particular, we have elaborated the sections on model assumptions and added new figures/figure-panels to quantitatively present the model predictions. We have also revised the details of the one-dimensional continuum theory for PCP which, we feel, presents a detailed quantitative picture of PCP and its dependence on model parameters.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, Leiba et al. aim at establishing the developing zebrafish embryo as a suitable infection model to study Salmonella persistence in vivo. Under environmental stress (ex: macrophage phagosomes) a proportion of bacteria switch to a slow/arrested growth state conferring increased resistance to antibiotic treatments. Persisters are getting increasingly linked to infection relapses. Understanding how persistent infections emerge and bacteria survive in an organism for long time without replicating before switching back to a replicative state is essential. Zebrafish represents an alternative model to mice offering the possibility to image the whole organism and capture persistency with an amazing spatio-temporal resolution.

      In this paper, the authors demonstrate that persistent infections of Salmonella can be reproduced in the developing zebrafish. The kinetics of infection have been well characterized and shows a very nice heterogeneity between animals demonstrating the complex host-pathogen interactions (Fig 1). From the perspective of persistence, the presence of Salmonella survivors to host clearing is reported until 14dpi demonstrating the possibility to induce persistent infection in this model. Through the manuscript, the authors have used a variety of state-of-the-art technics illustrating the flexibility of this model including microscopy and imaging of specific immune populations, various transgenic animals and selective depletion of macrophages or neutrophils to assess their relative contributions. Overall, the conclusions of the authors are well supported by the presented data. This said, the authors should strengthen the conclusions of the paper by providing a better characterization of the infection.

      Major comments:

      1) Figure 1: What is the general life-spam of the fish?

      The general life-span of the zebrafish is approximately 3 years on average. Persistent infection is determined by the existence of a fraction of bacteria that endure over an extended period (after 96 hpi). Further, we observed Salmonella persistence for 14 days. In figure 1, we don’t think that the information of the general life-span of the zebrafish is critical.

      2) Figure 2: It would be nice to clearly state what infection scenario we are looking at. Have the authors studied "high proliferation", "infected" or "cleared" zebrafish?

      In Figure 2 we have studied the "infected" group. Both "high proliferation" and "cleared" larvae were excluded from the analysis. This is now clearly stated in the legend of Figure 2.

      3) Figure 3 and 4: It would be very informative if the authors can tell us what proportion of Salmonella is associated with macrophages and neutrophils. From panel C and D (Figure 3) and Figure 4 C and D and Suppl Fig 1, it seems that a lot of bacteria are extracellular. Maybe an EM image of the tissue would help to understand if the bacteria is "all" intracellular or intracellular.

      We apologize for any misunderstanding regarding the presence of intra- and extracellular bacteria depicted in Figure 3 C and D, Figure 4 C and D and Figure 3 -Suppl Fig 1. These figures illustrate infection experiments conducted in single-reporter larvae, limiting our analysis to bacteria associated with a single cell type. Figure 3G and Figure 4E-G, the panels depict infection experiments carried out in dual-reporter larvae, showing bacteria associated or not with macrophages and neutrophils. The present study aimed to establish the role of neutrophils and macrophages in the control of early and persistent Salmonella infection but further studies will focus on the exact localization of Salmonella during the course of the infection and, despite being a challenging technique for zebrafish, electron microscopy could be of great interest, allowing to visualize any type of cells (to determine if all bacteria are intracellular) at high resolution.

      4) Figure 3 and 4: It would be very useful if the authors can tell us if the intracellular bacteria are mainly found individually (like in Figure 3C) or does host cells harbor many intracellular bacteria. Looking at figure 4G: it is not clear to me how many intracellular bacteria can be counted on this image.

      This is an interesting suggestion. At present, an accurate quantification of the intracellular bacteria on microscopy 3D-datasets is challenging because bacteria aggregate inside the cells. At 4 hpi, single bacteria can occasionally be observed outside leukocytes, while most of infected macrophages harbored several intracellular bacteria (bacteria aggregates). To compare the levels of intracellular bacterial between acute and persistent stages, we measured the size of E2Crimson-positive (E2Crimson+) events. At 5 hpi, the median volume of E2Crimson+ events was lower than that at 4 dpi. The size distribution analysis of E2Crimson+ events indicated a higher representation of smaller volumes (0.5-1.5 m3 and 1.5-10 m3) at 5 hpi compared to 4 dpi, a stage during which very large E2Crimson+ events were observed (between 100-1000 m3, with some exceeding 1000 m3). This observation suggests an elevated presence of intracellular bacteria within the cells during persistent stages and that intracellular bacteria are predominantly observed as multiple rather than as solitary entities. This analysis has been incorporated in new Figure 5.

      5) Figure 3 and 4: The authors should also perform an experiment with a Salmonella strain harboring a growth reporter to quantify the amount of replicating and non-replicating bacteria. This experiment is not absolutely necessary for the story, but if possible, it would provide a very nice add-up to the story and impact to the paper.

      We welcome the reviewers’ suggestion, which we have indeed considered and planning to carry on in the future, along with experimented more oriented on the bacterial side.

      6) Figure 6: The authors should provide in suppl. the flow cytometry scatter plots used to delineate the different subpopulations.

      We agree with the reviewer that the flow cytometry scatter plots used to delineate the different subpopulations were missing and are now incorporated in new Fig 7 - figure supplement 2.

      7) Figure 6: A specific characterization of macrophages harboring Salmonella persisters at 4dpi is missing. As shown by the authors in Figure 6, the tnfa- populations of macrophages at 4dpi are very similar for both infected and non-infected larvae. Persisters should indeed reside within tnfa- macrophages but they should also induce a specific signature through the actions of Salmonella effectors. Measuring this signature will allow a direct comparison with published data in mice and assess how accurately the zebrafish model recapitulates the manipulation of macrophages by Salmonella

      We agree with the reviewer that a specific characterization of macrophages harboring persistent Salmonella at 4 dpi is missing. However due to the technical limitation inherent to the model (limited recovery of infected cells following FACS sorting), we were not able to specifically sort infected macrophages at 4 dpi.